1. 23 Sep, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Two bugs in rack one of which can lead to a panic. · fd69939e
      Randall Stewart authored
      In extensive testing in NF we have found two issues inside
      the rack stack.
      
      1) An incorrect offset is being generated by the fast send path when a fast send is initiated on
         the end of the socket buffer and before the fast send runs, the sb_compress macro adds data to the trailing socket.
         This fools the fast send code into thinking the sb offset changed and it miscalculates a "updated offset".
         It should only do that when the mbuf in question got smaller.. i.e. an ack was processed. This can lead to
         a panic deref'ing a NULL mbuf if that packet is ever retransmitted. At the best case it leads to invalid data being
         sent to the client which usually terminates the connection. The fix is to have the proper logic (that is in the rsm fast path)
         to make sure we only update the offset when the mbuf shrinks.
      2) The other issue is more bothersome. The timestamp check in rack needs to use the msec timestamp when
         comparing the timestamp echo to now. It was using a microsecond timestamp which ends up giving error
         prone results but causes only small harm in trying to identify which send to use in RTT calculations if its a retransmit.
      
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32062
      fd69939e
  2. 17 Aug, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Add support for DSACK based reordering window to rack. · 5baf32c9
      Randall Stewart authored
      The rack stack, with respect to the rack bits in it, was originally built based
      on an early I-D of rack. In fact at that time the TLP bits were in a separate
      I-D. The dynamic reordering window based on DSACK events was not present
      in rack at that time. It is now part of the RFC and we need to update our stack
      to include these features. However we want to have a way to control the feature
      so that we can, if the admin decides, make it stay the same way system wide as
      well as via socket option. The new sysctl and socket option has the following
      meaning for setting:
      
      00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window
      01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window
      10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window
      11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)
      
      The default currently in the sysctl is 3 so we get standards based behavior.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31506
      5baf32c9
  3. 07 Jul, 2021 2 commits
    • Andrew Gallatin's avatar
      tcp: fix alternate stack build with LINT-NO{INET,INET6,IP} · b1e806c0
      Andrew Gallatin authored
      When fixing another bug, I noticed that the alternate
      TCP stacks do not build when various combinations of
      ipv4 and ipv6 are disabled.
      
      Reviewed by:	rrs, tuexen
      Differential Revision:	https://reviews.freebsd.org/D31094
      Sponsored by: Netflix
      b1e806c0
    • Randall Stewart's avatar
      tcp: HPTS performance enhancements · d7955cc0
      Randall Stewart authored
      HPTS drives both rack and bbr, and yet there have been many complaints
      about performance. This bit of work restructures hpts to help reduce CPU
      overhead. It does this by now instead of relying on the timer/callout to
      drive it instead use user return from a system call as well as lro flushes
      to drive hpts. The timer becomes a backstop that dynamically adjusts
      based on how "late" we are.
      
      Reviewed by: tuexen, glebius
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31083
      d7955cc0
  4. 06 Jul, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Address goodput and TLP edge cases. · e834f9a4
      Randall Stewart authored
      There are several cases where we make a goodput measurement and we are running
      out of data when we decide to make the measurement. In reality we should not make
      such a measurement if there is no chance we can have "enough" data. There is also
      some corner case TLP's that end up not registering as a TLP like they should, we
      fix this by pushing the doing_tlp setup to the actual timeout that knows it did
      a TLP. This makes it so we always have the appropriate flag on the sendmap
      indicating a TLP being done as well as count correctly so we make no more
      that two TLP's.
      
      In addressing the goodput lets also add a "quality" metric that can be viewed via
      blackbox logs so that a casual observer does not have to figure out how good
      of a measurement it is. This is needed due to the fact that we may still make
      a measurement that is of a poorer quality as we run out of data but still have
      a minimal amount of data to make a measurement.
      
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31076
      e834f9a4
  5. 25 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection... · 9e4d9e4c
      Randall Stewart authored
      tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
      
      Hardware TLS is now supported in some interface cards and it works well. Except that
      when we have connections that retransmit a lot we get into trouble with all the retransmits.
      This prep step makes way for change that Drew will be making so that we can "kick out" a
      session from hardware TLS.
      
      Reviewed by: mtuexen, gallatin
      Sponsored by: Netflix Inc
      Differential Revision: https://reviews.freebsd.org/D30895
      9e4d9e4c
  6. 24 Jun, 2021 1 commit
  7. 11 Jun, 2021 3 commits
    • Michael Tuexen's avatar
      tcp: remove debug output from RACK · f1536bb5
      Michael Tuexen authored
      Reported by:		iron.udjin@gmail.com, Marek Zarychta
      Reviewed by:		rrs
      PR:			256538
      MFC after:		3 days
      Differential Revision:	https://reviews.freebsd.org/D30723
      Sponsored by:		Netflix, Inc.
      f1536bb5
    • Randall Stewart's avatar
      tcp: Missing mfree in rack and bbr · ba1b3e48
      Randall Stewart authored
      Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
      then not including a timestamp. This involved in the input path doing a goto done_with_input
      label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
      This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
      missing m_freem() but rack did not). This then caused the missing m_freem to show
      up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
      later (after processing) is not a good idea, even though its only for logging. Best to
      copy that off before any frees can take place.
      
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30727
      ba1b3e48
    • Michael Tuexen's avatar
      tcp: fix compilation of IPv4-only builds · 224cf7b3
      Michael Tuexen authored
      PR:			256538
      Reported by:		iron.udjin@gmail.com
      MFC after:		3 days
      Sponsored by:		Netflix, Inc.
      224cf7b3
  8. 10 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Mbuf leak while holding a socket buffer lock. · 67e89281
      Randall Stewart authored
      When running at NF the current Rack and BBR changes with the recent
      commits from Richard that cause the socket buffer lock to be held over
      the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
      we get a lot of leaked mbufs. I don't think that this leak is actually caused
      by holding the lock or what Richard has done, but is exposing some other
      bug that has probably been lying dormant for a long time. I will continue to
      look (using his changes) at what is going on to try to root cause out the issue.
      
      In the meantime I can't leave the leaks out for everyone else. So this commit
      will revert all of Richards changes and move both Rack and BBR back to just
      doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
      
      We may want to look at adding back in Richards changes after I have pinpointed
      the root cause of the mbuf leak and fixed it.
      
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30704
      67e89281
  9. 04 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: A better fix for the previously attempted fix of the ack-war issue with tcp. · 4747500d
      Randall Stewart authored
      So it turns out that my fix before was not correct. It ended with us failing
      some of the "improved" SYN tests, since we are not in the correct states.
      With more digging I have figured out the root of the problem is that when
      we receive a SYN|FIN the reassembly code made it so we create a segq entry
      to hold the FIN. In the established state where we were not in order this
      would be correct i.e. a 0 len with a FIN would need to be accepted. But
      if you are in a front state we need to strip the FIN so we correctly handle
      the ACK but ignore the FIN. This gets us into the proper states
      and avoids the previous ack war.
      
      I back out some of the previous changes but then add a new change
      here in tcp_reass() that fixes the root cause of the issue. We still
      leave the rack panic fixes in place however.
      
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30627
      4747500d
  10. 27 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: When we have an out-of-order FIN we do want to strip off the FIN bit. · 8c69d988
      Randall Stewart authored
      The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
      However there was a missing case, i.e. where we get an out-of-order FIN by itself.
      In such a case we don't want to leave the FIN bit set, otherwise we will do the
      wrong thing and ack the FIN incorrectly. Instead we need to go through the
      tcp_reasm() code and that way the FIN will be stripped and all will be well.
      
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30497
      8c69d988
  11. 26 May, 2021 1 commit
  12. 25 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Fix bugs related to the PUSH bit and rack and an ack war · 13c0e198
      Randall Stewart authored
      Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
      in the last commit. The problem is the left edge gets transmitted before the adjustments are done
      to the send_map, this means that right edge bits must be considered to be added only if
      the entire RSM is being retransmitted.
      
      Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
      out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
      After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
      what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
      should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
      That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
      timers running. This is because the usrclosed function gets called and the FIN's and such have
      already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
      stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
      recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
      before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
      timer can speed this up in testing.
      
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30451
      13c0e198
  13. 24 May, 2021 2 commits
  14. 22 May, 2021 1 commit
  15. 21 May, 2021 3 commits
  16. 13 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Incorrect KASSERT causes a panic in rack · 02cffbc2
      Randall Stewart authored
      Skyzall found an interesting panic in rack. When a SYN and FIN are
      both sent together a KASSERT gets tripped where it is validating that
      a mbuf pointer is in the sendmap. But a SYN and FIN often will not
      have a mbuf pointer. So the fix is two fold a) make sure that the
      SYN and FIN split the right way when cloning an RSM SYN on left
      edge and FIN on right. And also make sure the KASSERT properly
      accounts for the case that we have a SYN or FIN so we don't
      panic.
      
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc.
      Differential Revision:	https://reviews.freebsd.org/D30241
      02cffbc2
  17. 12 May, 2021 1 commit
  18. 11 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: In rack, we must only convert restored rtt when the hostcache does restore them. · 4b86a24a
      Randall Stewart authored
      Rack now after the previous commit is very careful to translate any
      value in the hostcache for srtt/rttvar into its proper format. However
      there is a snafu here in that if tp->srtt is 0 is the only time that
      the HC will actually restore the srtt. We need to then only convert
      the srtt restored when it is actually restored. We do this by making
      sure it was zero before the call to cc_conn_init and it is non-zero
      afterwards.
      
      Reviewed by:	Michael Tuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30213
      4b86a24a
  19. 10 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp:Host cache and rack ending up with incorrect values. · 9867224b
      Randall Stewart authored
      The hostcache up to now as been updated in the discard callback
      but without checking if we are all done (the race where there are
      more than one calls and the counter has not yet reached zero). This
      means that when the race occurs, we end up calling the hc_upate
      more than once. Also alternate stacks can keep there srtt/rttvar
      in different formats (example rack keeps its values in microseconds).
      Since we call the hc_update *before* the stack fini() then the
      values will be in the wrong format.
      
      Rack on the other hand, needs to convert items pulled from the
      hostcache into its internal format else it may end up with
      very much incorrect values from the hostcache. In the process
      lets commonize the update mechanism for srtt/rttvar since we
      now have more than one place that needs to call it.
      
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30172
      9867224b
  20. 07 May, 2021 1 commit
  21. 06 May, 2021 1 commit
  22. 18 Apr, 2021 1 commit
    • Michael Tuexen's avatar
      tcp: add support for TCP over UDP · 9e644c23
      Michael Tuexen authored
      Adding support for TCP over UDP allows communication with
      TCP stacks which can be implemented in userspace without
      requiring special priviledges or specific support by the OS.
      This is joint work with rrs.
      
      Reviewed by:		rrs
      Sponsored by:		Netflix, Inc.
      MFC after:		1 week
      Differential Revision:	https://reviews.freebsd.org/D29469
      9e644c23
  23. 17 Apr, 2021 1 commit
  24. 22 Mar, 2021 1 commit
  25. 13 Mar, 2021 1 commit
    • Gordon Bergling's avatar
      Fix some common typos in comments · 5666643a
      Gordon Bergling authored
      - occured -> occurred
      - normaly -> normally
      - controling -> controlling
      - fileds -> fields
      - insterted -> inserted
      - outputing -> outputting
      
      MFC after:	1 week
      5666643a
  26. 05 Mar, 2021 1 commit
  27. 02 Mar, 2021 1 commit
  28. 28 Jan, 2021 1 commit
  29. 14 Jan, 2021 2 commits
  30. 09 Nov, 2020 1 commit
    • Michael Tuexen's avatar
      RFC 7323 specifies that: · 283c76c7
      Michael Tuexen authored
      * TCP segments without timestamps should be dropped when support for
        the timestamp option has been negotiated.
      * TCP segments with timestamps should be processed normally if support
        for the timestamp option has not been negotiated.
      This patch enforces the above.
      
      PR:			250499
      Reviewed by:		gnn, rrs
      MFC after:		1 week
      Sponsored by:		Netflix, Inc
      Differential Revision:	https://reviews.freebsd.org/D27148
      283c76c7
  31. 08 Nov, 2020 1 commit
  32. 09 Sep, 2020 1 commit
  33. 01 Sep, 2020 1 commit