1. 02 Dec, 2021 1 commit
  2. 30 Nov, 2021 1 commit
  3. 17 Nov, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Rack ack war with a mis-behaving firewall or nat with resets. · 97e28f0f
      Randall Stewart authored
      Previously we added ack-war prevention for misbehaving firewalls. This is
      where the f/w or nat messes up its sequence numbers and causes an ack-war.
      There is yet another type of ack war that we have found in the wild that is
      like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such)
      and instead of turning the ack/seq around and adding a TH_RST it does something
      real stupid and sends a new packet with seq=0. This of course triggers the challenge
      ack in the reset processing which then sends in a challenge ack (if the seq=0 is within
      the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat.
      This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters)
      to prevent it and allow say 5 per second by default.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32938
  4. 11 Nov, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Rack may still calculate long RTT on persists probes. · 26cbd002
      Randall Stewart authored
      When a persists probe is lost, we will end up calculating a long
      RTT based on the initial probe and when the response comes from the
      second probe (or third etc). This means we have a minimum of a
      confidence level of 3 on a incorrect probe. This commit will change it
      so that we have one of two options
      a) Just not count RTT of probes where we had a loss
      b) Count them still but degrade the confidence to 0.
      I have set in this the default being to just not measure them, but I am open
      to having the default be otherwise.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32897
  5. 08 Nov, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Printf should be removed. · 477aeb3d
      Randall Stewart authored
      There is a printf when a socket option down to the CC module fails, this really
      should not be a printf. In fact this whole option needs to be re-thought in coordination
      with some other changes in the CC modules (its just not right but its ok what it
      does here if it fails since it will just use the ECN beta).
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32894
  6. 03 Nov, 2021 1 commit
  7. 29 Oct, 2021 2 commits
  8. 27 Oct, 2021 2 commits
  9. 26 Oct, 2021 1 commit
  10. 22 Oct, 2021 1 commit
  11. 01 Oct, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Make dsack stats available in netstat and also make sure its aware of TLP's. · a36230f7
      Randall Stewart authored
      DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
      on DSACKs however are very useful in figuring out how much bad retransmissions you
      are doing. This is further complicated, however, by stacks that do TLP. A TLP
      when discovering a lost ack in the reverse path will cause the generation
      of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
      as the more traditional dsack-bytes and dsack-packets. These will now
      all display in netstat -p tcp -s. This also updates all stacks that
      are currently built to keep track of these stats.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32158
  12. 23 Sep, 2021 2 commits
    • Randall Stewart's avatar
      tcp: Rack compressed ack path updates the recv window too easily · 1ca931a5
      Randall Stewart authored
      The compressed ack path of rack is not following proper procedures in updating
      the peers window. It should be checking the seq and ack values before updating and
      instead it is blindly updating the values. This could in theory get the wrong window
      in the connection for some length of time.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32082
    • Randall Stewart's avatar
      tcp: Two bugs in rack one of which can lead to a panic. · fd69939e
      Randall Stewart authored
      In extensive testing in NF we have found two issues inside
      the rack stack.
      1) An incorrect offset is being generated by the fast send path when a fast send is initiated on
         the end of the socket buffer and before the fast send runs, the sb_compress macro adds data to the trailing socket.
         This fools the fast send code into thinking the sb offset changed and it miscalculates a "updated offset".
         It should only do that when the mbuf in question got smaller.. i.e. an ack was processed. This can lead to
         a panic deref'ing a NULL mbuf if that packet is ever retransmitted. At the best case it leads to invalid data being
         sent to the client which usually terminates the connection. The fix is to have the proper logic (that is in the rsm fast path)
         to make sure we only update the offset when the mbuf shrinks.
      2) The other issue is more bothersome. The timestamp check in rack needs to use the msec timestamp when
         comparing the timestamp echo to now. It was using a microsecond timestamp which ends up giving error
         prone results but causes only small harm in trying to identify which send to use in RTT calculations if its a retransmit.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32062
  13. 17 Aug, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Add support for DSACK based reordering window to rack. · 5baf32c9
      Randall Stewart authored
      The rack stack, with respect to the rack bits in it, was originally built based
      on an early I-D of rack. In fact at that time the TLP bits were in a separate
      I-D. The dynamic reordering window based on DSACK events was not present
      in rack at that time. It is now part of the RFC and we need to update our stack
      to include these features. However we want to have a way to control the feature
      so that we can, if the admin decides, make it stay the same way system wide as
      well as via socket option. The new sysctl and socket option has the following
      meaning for setting:
      00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window
      01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window
      10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window
      11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)
      The default currently in the sysctl is 3 so we get standards based behavior.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31506
  14. 07 Jul, 2021 2 commits
    • Andrew Gallatin's avatar
      tcp: fix alternate stack build with LINT-NO{INET,INET6,IP} · b1e806c0
      Andrew Gallatin authored
      When fixing another bug, I noticed that the alternate
      TCP stacks do not build when various combinations of
      ipv4 and ipv6 are disabled.
      Reviewed by:	rrs, tuexen
      Differential Revision:	https://reviews.freebsd.org/D31094
      Sponsored by: Netflix
    • Randall Stewart's avatar
      tcp: HPTS performance enhancements · d7955cc0
      Randall Stewart authored
      HPTS drives both rack and bbr, and yet there have been many complaints
      about performance. This bit of work restructures hpts to help reduce CPU
      overhead. It does this by now instead of relying on the timer/callout to
      drive it instead use user return from a system call as well as lro flushes
      to drive hpts. The timer becomes a backstop that dynamically adjusts
      based on how "late" we are.
      Reviewed by: tuexen, glebius
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31083
  15. 06 Jul, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Address goodput and TLP edge cases. · e834f9a4
      Randall Stewart authored
      There are several cases where we make a goodput measurement and we are running
      out of data when we decide to make the measurement. In reality we should not make
      such a measurement if there is no chance we can have "enough" data. There is also
      some corner case TLP's that end up not registering as a TLP like they should, we
      fix this by pushing the doing_tlp setup to the actual timeout that knows it did
      a TLP. This makes it so we always have the appropriate flag on the sendmap
      indicating a TLP being done as well as count correctly so we make no more
      that two TLP's.
      In addressing the goodput lets also add a "quality" metric that can be viewed via
      blackbox logs so that a casual observer does not have to figure out how good
      of a measurement it is. This is needed due to the fact that we may still make
      a measurement that is of a poorer quality as we run out of data but still have
      a minimal amount of data to make a measurement.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31076
  16. 25 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection... · 9e4d9e4c
      Randall Stewart authored
      tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
      Hardware TLS is now supported in some interface cards and it works well. Except that
      when we have connections that retransmit a lot we get into trouble with all the retransmits.
      This prep step makes way for change that Drew will be making so that we can "kick out" a
      session from hardware TLS.
      Reviewed by: mtuexen, gallatin
      Sponsored by: Netflix Inc
      Differential Revision: https://reviews.freebsd.org/D30895
  17. 24 Jun, 2021 1 commit
  18. 11 Jun, 2021 3 commits
    • Michael Tuexen's avatar
      tcp: remove debug output from RACK · f1536bb5
      Michael Tuexen authored
      Reported by:		iron.udjin@gmail.com, Marek Zarychta
      Reviewed by:		rrs
      PR:			256538
      MFC after:		3 days
      Differential Revision:	https://reviews.freebsd.org/D30723
      Sponsored by:		Netflix, Inc.
    • Randall Stewart's avatar
      tcp: Missing mfree in rack and bbr · ba1b3e48
      Randall Stewart authored
      Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
      then not including a timestamp. This involved in the input path doing a goto done_with_input
      label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
      This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
      missing m_freem() but rack did not). This then caused the missing m_freem to show
      up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
      later (after processing) is not a good idea, even though its only for logging. Best to
      copy that off before any frees can take place.
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30727
    • Michael Tuexen's avatar
      tcp: fix compilation of IPv4-only builds · 224cf7b3
      Michael Tuexen authored
      PR:			256538
      Reported by:		iron.udjin@gmail.com
      MFC after:		3 days
      Sponsored by:		Netflix, Inc.
  19. 10 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Mbuf leak while holding a socket buffer lock. · 67e89281
      Randall Stewart authored
      When running at NF the current Rack and BBR changes with the recent
      commits from Richard that cause the socket buffer lock to be held over
      the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
      we get a lot of leaked mbufs. I don't think that this leak is actually caused
      by holding the lock or what Richard has done, but is exposing some other
      bug that has probably been lying dormant for a long time. I will continue to
      look (using his changes) at what is going on to try to root cause out the issue.
      In the meantime I can't leave the leaks out for everyone else. So this commit
      will revert all of Richards changes and move both Rack and BBR back to just
      doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
      We may want to look at adding back in Richards changes after I have pinpointed
      the root cause of the mbuf leak and fixed it.
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30704
  20. 04 Jun, 2021 1 commit
    • Randall Stewart's avatar
      tcp: A better fix for the previously attempted fix of the ack-war issue with tcp. · 4747500d
      Randall Stewart authored
      So it turns out that my fix before was not correct. It ended with us failing
      some of the "improved" SYN tests, since we are not in the correct states.
      With more digging I have figured out the root of the problem is that when
      we receive a SYN|FIN the reassembly code made it so we create a segq entry
      to hold the FIN. In the established state where we were not in order this
      would be correct i.e. a 0 len with a FIN would need to be accepted. But
      if you are in a front state we need to strip the FIN so we correctly handle
      the ACK but ignore the FIN. This gets us into the proper states
      and avoids the previous ack war.
      I back out some of the previous changes but then add a new change
      here in tcp_reass() that fixes the root cause of the issue. We still
      leave the rack panic fixes in place however.
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30627
  21. 27 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: When we have an out-of-order FIN we do want to strip off the FIN bit. · 8c69d988
      Randall Stewart authored
      The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
      However there was a missing case, i.e. where we get an out-of-order FIN by itself.
      In such a case we don't want to leave the FIN bit set, otherwise we will do the
      wrong thing and ack the FIN incorrectly. Instead we need to go through the
      tcp_reasm() code and that way the FIN will be stripped and all will be well.
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30497
  22. 26 May, 2021 1 commit
  23. 25 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Fix bugs related to the PUSH bit and rack and an ack war · 13c0e198
      Randall Stewart authored
      Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
      in the last commit. The problem is the left edge gets transmitted before the adjustments are done
      to the send_map, this means that right edge bits must be considered to be added only if
      the entire RSM is being retransmitted.
      Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
      out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
      After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
      what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
      should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
      That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
      timers running. This is because the usrclosed function gets called and the FIN's and such have
      already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
      stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
      recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
      before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
      timer can speed this up in testing.
      Reviewed by: mtuexen,rscheff
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30451
  24. 24 May, 2021 2 commits
  25. 22 May, 2021 1 commit
  26. 21 May, 2021 3 commits
  27. 13 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Incorrect KASSERT causes a panic in rack · 02cffbc2
      Randall Stewart authored
      Skyzall found an interesting panic in rack. When a SYN and FIN are
      both sent together a KASSERT gets tripped where it is validating that
      a mbuf pointer is in the sendmap. But a SYN and FIN often will not
      have a mbuf pointer. So the fix is two fold a) make sure that the
      SYN and FIN split the right way when cloning an RSM SYN on left
      edge and FIN on right. And also make sure the KASSERT properly
      accounts for the case that we have a SYN or FIN so we don't
      Reviewed by: mtuexen
      Sponsored by: Netflix Inc.
      Differential Revision:	https://reviews.freebsd.org/D30241
  28. 12 May, 2021 1 commit
  29. 11 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp: In rack, we must only convert restored rtt when the hostcache does restore them. · 4b86a24a
      Randall Stewart authored
      Rack now after the previous commit is very careful to translate any
      value in the hostcache for srtt/rttvar into its proper format. However
      there is a snafu here in that if tp->srtt is 0 is the only time that
      the HC will actually restore the srtt. We need to then only convert
      the srtt restored when it is actually restored. We do this by making
      sure it was zero before the call to cc_conn_init and it is non-zero
      Reviewed by:	Michael Tuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30213
  30. 10 May, 2021 1 commit
    • Randall Stewart's avatar
      tcp:Host cache and rack ending up with incorrect values. · 9867224b
      Randall Stewart authored
      The hostcache up to now as been updated in the discard callback
      but without checking if we are all done (the race where there are
      more than one calls and the counter has not yet reached zero). This
      means that when the race occurs, we end up calling the hc_upate
      more than once. Also alternate stacks can keep there srtt/rttvar
      in different formats (example rack keeps its values in microseconds).
      Since we call the hc_update *before* the stack fini() then the
      values will be in the wrong format.
      Rack on the other hand, needs to convert items pulled from the
      hostcache into its internal format else it may end up with
      very much incorrect values from the hostcache. In the process
      lets commonize the update mechanism for srtt/rttvar since we
      now have more than one place that needs to call it.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30172
  31. 07 May, 2021 1 commit