1. 03 Dec, 2021 2 commits
  2. 02 Dec, 2021 4 commits
    • Cy Schubert's avatar
      Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" · db0ac6de
      Cy Schubert authored
      This reverts commit 266f97b5, reversing
      changes made to a10253cf.
      A mismerge of a merge to catch up to main resulted in files being
      committed which should not have been.
    • Gleb Smirnoff's avatar
      tcp_hpts: rewrite inpcb synchronization · 2e27230f
      Gleb Smirnoff authored
      Just trust the pcb database, that if we did in_pcbref(), no way
      an inpcb can go away.  And if we never put a dropped inpcb on
      our queue, and tcp_discardcb() always removes an inpcb to be
      dropped from the queue, then any inpcb on the queue is valid.
      Now, to solve LOR between inpcb lock and HPTS queue lock do the
      following trick.  When we are about to process a certain time
      slot, take the full queue of the head list into on stack list,
      drop the HPTS lock and work on our queue.  This of course opens
      a race when an inpcb is being removed from the on stack queue,
      which was already mentioned in comments.  To address this race
      introduce generation count into queues.  If we want to remove
      an inpcb with generation count mismatch, we can't do that, we
      can only mark it with desired new time slot or -1 for remove.
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33026
    • Gleb Smirnoff's avatar
      tcp_hpts: rename input queue to drop queue and trim dead code · f971e791
      Gleb Smirnoff authored
      The HPTS input queue is in reality used only for "delayed drops".
      When a TCP stack decides to drop a connection on the output path
      it can't do that due to locking protocol between main tcp_output()
      and stacks.  So, rack/bbr utilize HPTS to drop the connection in
      a different context.
      In the past the queue could also process input packets in context
      of HPTS thread, but now no stack uses this, so remove this
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33025
    • Gleb Smirnoff's avatar
      SMR protection for inpcbs · de2d4784
      Gleb Smirnoff authored
      With introduction of epoch(9) synchronization to network stack the
      inpcb database became protected by the network epoch together with
      static network data (interfaces, addresses, etc).  However, inpcb
      aren't static in nature, they are created and destroyed all the
      time, which creates some traffic on the epoch(9) garbage collector.
      Fairly new feature of uma(9) - Safe Memory Reclamation allows to
      safely free memory in page-sized batches, with virtually zero
      overhead compared to uma_zfree().  However, unlike epoch(9), it
      puts stricter requirement on the access to the protected memory,
      needing the critical(9) section to access it.  Details:
      - The database is already build on CK lists, thanks to epoch(9).
      - For write access nothing is changed.
      - For a lookup in the database SMR section is now required.
        Once the desired inpcb is found we need to transition from SMR
        section to r/w lock on the inpcb itself, with a check that inpcb
        isn't yet freed.  This requires some compexity, since SMR section
        itself is a critical(9) section.  The complexity is hidden from
        KPI users in inp_smr_lock().
      - For a inpcb list traversal (a pcblist sysctl, or broadcast
        notification) also a new KPI is provided, that hides internals of
        the database - inp_next(struct inp_iterator *).
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33022
  3. 29 Nov, 2021 1 commit
  4. 19 Nov, 2021 1 commit
    • Gleb Smirnoff's avatar
      Add tcp_freecb() - single place to free tcpcb. · ff945008
      Gleb Smirnoff authored
      Until this change there were two places where we would free tcpcb -
      tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
      otherwise.  They were pretty much copy-n-paste, except that in the
      default case we would run tcp_hc_update().  Merge this into single
      function tcp_freecb() and move new short version of tcp_timer_discard()
      to tcp_timer.c and make it static.
      Reviewed by:		rrs, hselasky
      Differential revision:	https://reviews.freebsd.org/D32965
  5. 14 Nov, 2021 1 commit
    • Michael Tuexen's avatar
      tcp: Fix a locking issue related to logging · 2f62f92e
      Michael Tuexen authored
      tcp_respond() is sometimes called with only a read lock.
      The logging however, requires a write lock. So either
      try to upgrade the lock if needed, or don't log the packet.
      Reported by:		syzbot+8151ef969c170f76706b@syzkaller.appspotmail.com
      Reported by:		syzbot+eb679adb3304c511c1e4@syzkaller.appspotmail.com
      Reviewed by:		markj, rrs
      Sponsored by:		Netflix, Inc.
      Differential Revision:	https://reviews.freebsd.org/D32983
  6. 11 Nov, 2021 2 commits
    • Randall Stewart's avatar
      tcp: Rack may still calculate long RTT on persists probes. · 26cbd002
      Randall Stewart authored
      When a persists probe is lost, we will end up calculating a long
      RTT based on the initial probe and when the response comes from the
      second probe (or third etc). This means we have a minimum of a
      confidence level of 3 on a incorrect probe. This commit will change it
      so that we have one of two options
      a) Just not count RTT of probes where we had a loss
      b) Count them still but degrade the confidence to 0.
      I have set in this the default being to just not measure them, but I am open
      to having the default be otherwise.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32897
    • Randall Stewart's avatar
      tcp: Congestion control cleanup. · b8d60729
      Randall Stewart authored
      NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
      This patch does a bit of cleanup on TCP congestion control modules. There were some rather
      interesting surprises that one could get i.e. where you use a socket option to change
      from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
      a memory failure and end up on cc_newreno. This is not what one would expect. The
      new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
      and pass in to the init function preallocated memory. The CC init is expected in this
      case *not* to fail but if it does and a module does break the
      "no fail with memory given" contract we do fall back to the CC that was in place at the time.
      This also fixes up a set of common newreno utilities that can be shared amongst other
      CC modules instead of the other CC modules reaching into newreno and executing
      what they think is a "common and understood" function. Lets put these functions in
      cc.c and that way we have a common place that is easily findable by future developers or
      bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
      and HYSTART++ without having to dance through hoops for other CC modules, instead
      both newreno and the other modules just call into the common functions if they desire
      that behavior or roll there own if that makes more sense.
      Note: This commit changes the kernel configuration!! If you are not using GENERIC in
      some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
      CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
      as well if you desire. Note that if you create a kernel configuration that does not
      define a congestion control module and includes INET or INET6 the kernel compile will
      break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
      but you can specify any string that represents the name of the CC module (same names
      that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
      options CC_DEFAULT in your kernel configuration the kernel build will also break.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32693
  7. 09 Nov, 2021 1 commit
    • John Baldwin's avatar
      Don't require the socket lock for sorele(). · e3ba94d4
      John Baldwin authored
      Previously, sorele() always required the socket lock and dropped the
      lock if the released reference was not the last reference.  Many
      callers locked the socket lock just before calling sorele() resulting
      in a wasted lock/unlock when not dropping the last reference.
      Move the previous implementation of sorele() into a new
      sorele_locked() function and use it instead of sorele() for various
      places in uipc_socket.c that called sorele() while already holding the
      socket lock.
      The sorele() macro now uses refcount_release_if_not_last() try to drop
      the socket reference without locking the socket.  If that shortcut
      fails, it locks the socket and calls sorele_locked().
      Reviewed by:	kib, markj
      Sponsored by:	Chelsio Communications
      Differential Revision:	https://reviews.freebsd.org/D32741
  8. 03 Nov, 2021 1 commit
  9. 27 Oct, 2021 2 commits
  10. 01 Oct, 2021 1 commit
    • Randall Stewart's avatar
      tcp: Make dsack stats available in netstat and also make sure its aware of TLP's. · a36230f7
      Randall Stewart authored
      DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
      on DSACKs however are very useful in figuring out how much bad retransmissions you
      are doing. This is further complicated, however, by stacks that do TLP. A TLP
      when discovering a lost ack in the reverse path will cause the generation
      of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
      as the more traditional dsack-bytes and dsack-packets. These will now
      all display in netstat -p tcp -s. This also updates all stacks that
      are currently built to keep track of these stats.
      Reviewed by: tuexen
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D32158
  11. 13 Jul, 2021 1 commit
    • Randall Stewart's avatar
      tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly. · ca1a7e10
      Randall Stewart authored
      In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
      LRO without making sure that the checksum passes. Some drivers actually are
      aware of this and do not call lro when the csum failed, others do not do this and
      thus could end up sending data up that we think has a checksum passing when
      it does not.
      This change will fix that situation by properly verifying that the mbuf
      has the correct markings (CSUM VALID bits as well as csum in mbuf header
      is set to 0xffff).
      Reviewed by: tuexen, hselasky, gallatin
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31155
  12. 27 Jun, 2021 1 commit
    • Michael Tuexen's avatar
      tcp: tolerate missing timestamps · 870af3f4
      Michael Tuexen authored
      Some TCP stacks negotiate TS support, but do not send TS at all
      or not for keep-alive segments. Since this includes modern widely
      deployed stacks, tolerate the violation of RFC 7323 per default.
      Reviewed by:		rgrimes, rrs, rscheff
      MFC after:		3 days
      Differential Revision:	https://reviews.freebsd.org/D30740
      Sponsored by:		Netflix, Inc.
  13. 14 Jun, 2021 1 commit
    • Mark Johnston's avatar
      Consistently use the SOLISTENING() macro · f4bb1869
      Mark Johnston authored
      Some code was using it already, but in many places we were testing
      SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
      involving synchronization with listen(2), make the kernel consistently
      use SOLISTENING().  No functional change intended.
      MFC after:	1 week
      Sponsored by:	The FreeBSD Foundation
  14. 24 May, 2021 1 commit
  15. 10 May, 2021 2 commits
    • Richard Scheffenegger's avatar
      tcp: SACK Lost Retransmission Detection (LRD) · 0471a8c7
      Richard Scheffenegger authored
      Recover from excessive losses without reverting to a
      retransmission timeout (RTO). Disabled by default, enable
      with sysctl net.inet.tcp.do_lrd=1
      Reviewed By: #transport, rrs, tuexen, #manpages
      Sponsored by: Netapp, Inc.
      Differential Revision: https://reviews.freebsd.org/D28931
    • Randall Stewart's avatar
      tcp:Host cache and rack ending up with incorrect values. · 9867224b
      Randall Stewart authored
      The hostcache up to now as been updated in the discard callback
      but without checking if we are all done (the race where there are
      more than one calls and the counter has not yet reached zero). This
      means that when the race occurs, we end up calling the hc_upate
      more than once. Also alternate stacks can keep there srtt/rttvar
      in different formats (example rack keeps its values in microseconds).
      Since we call the hc_update *before* the stack fini() then the
      values will be in the wrong format.
      Rack on the other hand, needs to convert items pulled from the
      hostcache into its internal format else it may end up with
      very much incorrect values from the hostcache. In the process
      lets commonize the update mechanism for srtt/rttvar since we
      now have more than one place that needs to call it.
      Reviewed by: Michael Tuexen
      Sponsored by: Netflix Inc
      Differential Revision:	https://reviews.freebsd.org/D30172
  16. 06 May, 2021 1 commit
  17. 21 Apr, 2021 1 commit
    • Navdeep Parhar's avatar
      Path MTU discovery hooks for offloaded TCP connections. · 01d74fe1
      Navdeep Parhar authored
      Notify the TOE driver when when an ICMP type 3 code 4 (Fragmentation
      needed and DF set) message is received for an offloaded connection.
      This gives the driver an opportunity to lower the path MTU for the
      connection and resume transmission, much like what the kernel does for
      the connections that it handles.
      Reviewed by:	glebius@
      Sponsored by:	Chelsio Communications
      Differential Revision:	https://reviews.freebsd.org/D29755
  18. 20 Apr, 2021 1 commit
    • Hans Petter Selasky's avatar
      Add TCP LRO support for VLAN and VxLAN. · 9ca874cf
      Hans Petter Selasky authored
      This change makes the TCP LRO code more generic and flexible with regards
      to supporting multiple different TCP encapsulation protocols and in general
      lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
      to merge TCP packets for the same flow, to reduce the number of calls to upper
      layers. This reduces CPU and increases performance, due to being able to send
      larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
      possible to avoid per-packet interaction by the host CPU.
      Because the current TCP LRO code was tightly bound and optimized for TCP/IP
      over ethernet only, several larger changes were needed. Also a minor bug was
      fixed in the flushing mechanism for inactive entries, where the expire time,
      "le->mtime" was not always properly set.
      To avoid having to re-run time consuming regression tests for every change,
      it was chosen to squash the following list of changes into a single commit:
      - Refactor parsing of all address information into the "lro_parser" structure.
        This easily allows to reuse parsing code for inner headers.
      - Speedup header data comparison. Don't compare field by field, but
        instead use an unsigned long array, where the fields get packed.
      - Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
        recursivly, only applying deltas as the result of updating payload data.
      - Make smaller inline functions doing one operation at a time instead of
        big functions having repeated code.
      - Refactor the TCP ACK compression code to only execute once
        per TCP LRO flush. This gives a minor performance improvement and
        keeps the code simple.
      - Use sbintime() for all time-keeping. This change also fixes flushing
        of inactive entries.
      - Try to shrink the size of the LRO entry, because it is frequently zeroed.
      - Removed unused TCP LRO macros.
      - Cleanup unused TCP LRO statistics counters while at it.
      - Try to use __predict_true() and predict_false() to optimise CPU branch
      Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.
      Tested by:	Netflix
      Reviewed by:	rrs (transport)
      Differential Revision:	https://reviews.freebsd.org/D29564
      MFC after:	2 week
      Sponsored by:	Mellanox Technologies // NVIDIA Networking
  19. 18 Apr, 2021 1 commit
    • Michael Tuexen's avatar
      tcp: add support for TCP over UDP · 9e644c23
      Michael Tuexen authored
      Adding support for TCP over UDP allows communication with
      TCP stacks which can be implemented in userspace without
      requiring special priviledges or specific support by the OS.
      This is joint work with rrs.
      Reviewed by:		rrs
      Sponsored by:		Netflix, Inc.
      MFC after:		1 week
      Differential Revision:	https://reviews.freebsd.org/D29469
  20. 16 Apr, 2021 1 commit
  21. 12 Apr, 2021 1 commit
    • Gleb Smirnoff's avatar
      tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets · 08d9c920
      Gleb Smirnoff authored
      When packet is a SYN packet, we don't need to modify any existing PCB.
      Normally SYN arrives on a listening socket, we either create a syncache
      entry or generate syncookie, but we don't modify anything with the
      listening socket or associated PCB. Thus create a new PCB lookup
      mode - rlock if listening. This removes the primary contention point
      under SYN flood - the listening socket PCB.
      Sidenote: when SYN arrives on a synchronized connection, we still
      don't need write access to PCB to send a challenge ACK or just to
      drop. There is only one exclusion - tcptw recycling. However,
      existing entanglement of tcp_input + stacks doesn't allow to make
      this change small. Consider this patch as first approach to the problem.
      Reviewed by:	rrs
      Differential revision:	https://reviews.freebsd.org/D29576
  22. 17 Feb, 2021 1 commit
    • Randall Stewart's avatar
      Update the LRO processing code so that we can support · 69a34e8d
      Randall Stewart authored
      a further CPU enhancements for compressed acks. These
      are acks that are compressed into an mbuf. The transport
      has to be aware of how to process these, and an upcoming
      update to rack will do so. You need the rack changes
      to actually test and validate these since if the transport
      does not support mbuf compression, then the old code paths
      stay in place. We do in this commit take out the concept
      of logging if you don't have a lock (which was quite
      dangerous and was only for some early debugging but has
      been left in the code).
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D28374
  23. 20 Jan, 2021 1 commit
    • Richard Scheffenegger's avatar
      Address panic with PRR due to missed initialization of recover_fs · bc7ee8e5
      Richard Scheffenegger authored
      When using the base stack in conjunction with RACK, it appears that
      infrequently, ++tp->t_dupacks is instantly larger than tcprexmtthresh.
      This leaves the recover flightsize (sackhint.recover_fs) uninitialized,
      leading to a div/0 panic.
      Address this by properly initializing the variable just prior to first
      use, if it is not properly initialized.
      In order to prevent stale information from a prior recovery to
      negatively impact the PRR calculations in this event, also clear
      recover_fs once loss recovery is finished.
      Finally, improve the readability of the initialization of recover_fs
      when t_dupacks == tcprexmtthresh by adjusting the indentation and
      using the max(1, snd_nxt - snd_una) macro.
      Reviewers: rrs, kbowling, tuexen, jtl, #transport, gnn!, jmg, manu, #manpages
      Reviewed By: rrs, kbowling, #transport
      Subscribers: bdrewery, andrew, rpokala, ae, emaste, bz, bcran, #linuxkpi, imp, melifaro
      Differential Revision: https://reviews.freebsd.org/D28114
  24. 14 Jan, 2021 1 commit
    • Michael Tuexen's avatar
      tcp: add sysctl to tolerate TCP segments missing timestamps · d2b3cedd
      Michael Tuexen authored
      When timestamp support has been negotiated, TCP segements received
      without a timestamp should be discarded. However, there are broken
      TCP implementations (for example, stacks used by Omniswitch 63xx and
      64xx models), which send TCP segments without timestamps although
      they negotiated timestamp support.
      This patch adds a sysctl variable which tolerates such TCP segments
      and allows to interoperate with broken stacks.
      Reviewed by:		jtl@, rscheff@
      Differential Revision:	https://reviews.freebsd.org/D28142
      Sponsored by:		Netflix, Inc.
      PR:			252449
      MFC after:		1 week
  25. 29 Oct, 2020 1 commit
  26. 09 Oct, 2020 1 commit
  27. 25 Sep, 2020 1 commit
    • Richard Scheffenegger's avatar
      TCP: send full initial window when timestamps are in use · e3995661
      Richard Scheffenegger authored
      The fastpath in tcp_output tries to send out
      full segments, and avoid sending partial segments by
      comparing against the static t_maxseg variable.
      That value does not consider tcp options like timestamps,
      while the initial window calculation is using
      the correct dynamic tcp_maxseg() function.
      Due to this interaction, the last, full size segment
      is considered too short and not sent out immediately.
      Reviewed by:	tuexen
      MFC after:	2 weeks
      Sponsored by:	NetApp, Inc.
      Differential Revision:	https://reviews.freebsd.org/D26478
  28. 13 Sep, 2020 1 commit
  29. 01 Sep, 2020 1 commit
  30. 31 Jul, 2020 1 commit
    • Randall Stewart's avatar
      The recent changes to move the ref count increment · 8315f1ea
      Randall Stewart authored
      back from the end of the function created an issue.
      If one of the routines returns NULL during setup
      we have inp's with extra references (which is why
      the increment was at the end).
      Also the stack switch return code was being ignored
      and actually has meaning if the stack cannot take over
      it should return NULL.
      Fix both of these situation by being sure to test the
      return code and of course in any case of return NULL (there
      are 3) make sure we properly reduce the ref count.
      Sponsored by:	Netflix Inc.
      Differential Revision:	https://reviews.freebsd.org/D25903
  31. 07 Jul, 2020 1 commit
  32. 28 May, 2020 1 commit
  33. 27 Apr, 2020 1 commit