1. 02 Dec, 2021 2 commits
    • Gleb Smirnoff's avatar
      tcp_hpts: rename input queue to drop queue and trim dead code · f971e791
      Gleb Smirnoff authored
      The HPTS input queue is in reality used only for "delayed drops".
      When a TCP stack decides to drop a connection on the output path
      it can't do that due to locking protocol between main tcp_output()
      and stacks.  So, rack/bbr utilize HPTS to drop the connection in
      a different context.
      
      In the past the queue could also process input packets in context
      of HPTS thread, but now no stack uses this, so remove this
      functionality.
      
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33025
      f971e791
    • Gleb Smirnoff's avatar
      SMR protection for inpcbs · de2d4784
      Gleb Smirnoff authored
      With introduction of epoch(9) synchronization to network stack the
      inpcb database became protected by the network epoch together with
      static network data (interfaces, addresses, etc).  However, inpcb
      aren't static in nature, they are created and destroyed all the
      time, which creates some traffic on the epoch(9) garbage collector.
      
      Fairly new feature of uma(9) - Safe Memory Reclamation allows to
      safely free memory in page-sized batches, with virtually zero
      overhead compared to uma_zfree().  However, unlike epoch(9), it
      puts stricter requirement on the access to the protected memory,
      needing the critical(9) section to access it.  Details:
      
      - The database is already build on CK lists, thanks to epoch(9).
      - For write access nothing is changed.
      - For a lookup in the database SMR section is now required.
        Once the desired inpcb is found we need to transition from SMR
        section to r/w lock on the inpcb itself, with a check that inpcb
        isn't yet freed.  This requires some compexity, since SMR section
        itself is a critical(9) section.  The complexity is hidden from
        KPI users in inp_smr_lock().
      - For a inpcb list traversal (a pcblist sysctl, or broadcast
        notification) also a new KPI is provided, that hides internals of
        the database - inp_next(struct inp_iterator *).
      
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33022
      de2d4784
  2. 30 Nov, 2021 1 commit
  3. 04 Nov, 2021 1 commit
    • Hans Petter Selasky's avatar
      Use layer five checksum flags in the mbuf packet header to pass on crypto state. · 10a62eb1
      Hans Petter Selasky authored
      The mbuf protocol flags get cleared between layers, and also it was discovered
      that M_DECRYPTED conflicts with M_HASFCS when receiving ethernet patckets.
      
      Add the proper CSUM_TLS_MASK and CSUM_TLS_DECRYPTED defines, and start using
      these instead of M_DECRYPTED inside the TCP LRO code.
      
      This change is needed by coming TLS RX hardware offload support patches.
      
      Suggested by:	kib@
      Reviewed by:	jhb@
      MFC after:	1 week
      Sponsored by:	NVIDIA Networking
      10a62eb1
  4. 25 Aug, 2021 1 commit
  5. 06 Aug, 2021 1 commit
  6. 16 Jul, 2021 1 commit
  7. 13 Jul, 2021 1 commit
    • Randall Stewart's avatar
      tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly. · ca1a7e10
      Randall Stewart authored
      In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
      LRO without making sure that the checksum passes. Some drivers actually are
      aware of this and do not call lro when the csum failed, others do not do this and
      thus could end up sending data up that we think has a checksum passing when
      it does not.
      
      This change will fix that situation by properly verifying that the mbuf
      has the correct markings (CSUM VALID bits as well as csum in mbuf header
      is set to 0xffff).
      
      Reviewed by: tuexen, hselasky, gallatin
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31155
      ca1a7e10
  8. 07 Jul, 2021 1 commit
    • Randall Stewart's avatar
      tcp: HPTS performance enhancements · d7955cc0
      Randall Stewart authored
      HPTS drives both rack and bbr, and yet there have been many complaints
      about performance. This bit of work restructures hpts to help reduce CPU
      overhead. It does this by now instead of relying on the timer/callout to
      drive it instead use user return from a system call as well as lro flushes
      to drive hpts. The timer becomes a backstop that dynamically adjusts
      based on how "late" we are.
      
      Reviewed by: tuexen, glebius
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D31083
      d7955cc0
  9. 09 Jun, 2021 1 commit
  10. 24 Apr, 2021 1 commit
    • Hans Petter Selasky's avatar
      Allow the tcp_lro_flush_all() function to be called when the control · a9b66dbd
      Hans Petter Selasky authored
      structure is zeroed, by setting the VNET after checking the mbuf count
      for zero. It appears there are some cases with early interrupts on some
      network devices which still trigger page-faults on accessing a NULL "ifp"
      pointer before the TCP LRO control structure has been initialized.
      This basically preserves the old behaviour, prior to
      9ca874cf .
      
      No functional change.
      
      Reported by:	rscheff@
      Differential Revision:	https://reviews.freebsd.org/D29564
      MFC after:	2 weeks
      Sponsored by:	Mellanox Technologies // NVIDIA Networking
      a9b66dbd
  11. 20 Apr, 2021 1 commit
    • Hans Petter Selasky's avatar
      Add TCP LRO support for VLAN and VxLAN. · 9ca874cf
      Hans Petter Selasky authored
      This change makes the TCP LRO code more generic and flexible with regards
      to supporting multiple different TCP encapsulation protocols and in general
      lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
      to merge TCP packets for the same flow, to reduce the number of calls to upper
      layers. This reduces CPU and increases performance, due to being able to send
      larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
      possible to avoid per-packet interaction by the host CPU.
      
      Because the current TCP LRO code was tightly bound and optimized for TCP/IP
      over ethernet only, several larger changes were needed. Also a minor bug was
      fixed in the flushing mechanism for inactive entries, where the expire time,
      "le->mtime" was not always properly set.
      
      To avoid having to re-run time consuming regression tests for every change,
      it was chosen to squash the following list of changes into a single commit:
      - Refactor parsing of all address information into the "lro_parser" structure.
        This easily allows to reuse parsing code for inner headers.
      - Speedup header data comparison. Don't compare field by field, but
        instead use an unsigned long array, where the fields get packed.
      - Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
        recursivly, only applying deltas as the result of updating payload data.
      - Make smaller inline functions doing one operation at a time instead of
        big functions having repeated code.
      - Refactor the TCP ACK compression code to only execute once
        per TCP LRO flush. This gives a minor performance improvement and
        keeps the code simple.
      - Use sbintime() for all time-keeping. This change also fixes flushing
        of inactive entries.
      - Try to shrink the size of the LRO entry, because it is frequently zeroed.
      - Removed unused TCP LRO macros.
      - Cleanup unused TCP LRO statistics counters while at it.
      - Try to use __predict_true() and predict_false() to optimise CPU branch
        predictions.
      
      Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.
      
      Tested by:	Netflix
      Reviewed by:	rrs (transport)
      Differential Revision:	https://reviews.freebsd.org/D29564
      MFC after:	2 week
      Sponsored by:	Mellanox Technologies // NVIDIA Networking
      9ca874cf
  12. 04 Mar, 2021 1 commit
  13. 18 Feb, 2021 2 commits
  14. 17 Feb, 2021 2 commits
    • Randall Stewart's avatar
      Add ifdef TCPHPTS around build_ack_entry and do_bpf_and_csum to avoid · ab4fad4b
      Randall Stewart authored
      warnings when HPTS is not included
      
      Thanks to Gary Jennejohn for pointing this out.
      ab4fad4b
    • Randall Stewart's avatar
      Update the LRO processing code so that we can support · 69a34e8d
      Randall Stewart authored
      a further CPU enhancements for compressed acks. These
      are acks that are compressed into an mbuf. The transport
      has to be aware of how to process these, and an upcoming
      update to rack will do so. You need the rack changes
      to actually test and validate these since if the transport
      does not support mbuf compression, then the old code paths
      stay in place. We do in this commit take out the concept
      of logging if you don't have a lock (which was quite
      dangerous and was only for some early debugging but has
      been left in the code).
      
      Sponsored by: Netflix Inc.
      Differential Revision: https://reviews.freebsd.org/D28374
      69a34e8d
  15. 01 Sep, 2020 1 commit
  16. 12 Feb, 2020 1 commit
  17. 07 Nov, 2019 1 commit
  18. 09 Oct, 2019 1 commit
    • Warner Losh's avatar
      Fix casting error from newer gcc · b23b156e
      Warner Losh authored
      Cast the pointers to (uintptr_t) before assigning to type
      uint64_t. This eliminates an error from gcc when we cast the pointer
      to a larger integer type.
      b23b156e
  19. 06 Oct, 2019 1 commit
    • Randall Stewart's avatar
      Brad Davis identified a problem with the new LRO code, VLAN's · 5b63b220
      Randall Stewart authored
      no longer worked. The problem was that the defines used the
      same space as the VLAN id. This commit does three things.
      1) Move the LRO used fields to the PH_per fields. This is
         safe since the entire PH_per is used for IP reassembly
         which LRO code will not hit.
      2) Remove old unused pace fields that are not used in mbuf.h
      3) The VLAN processing is not in the mbuf queueing code. Consequently
         if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing
         for now until rack_bbr_common is updated to handle the VLAN properly.
      
      Reported by:	Brad Davis
      5b63b220
  20. 06 Sep, 2019 2 commits
    • Conrad Meyer's avatar
      Fix build after r351934 · 373013b0
      Conrad Meyer authored
      tcp_queue_pkts() is only used if TCPHPTS is defined (and it is not by
      default).
      
      Reported by:	gcc
      373013b0
    • Randall Stewart's avatar
      This adds the final tweaks to LRO that will now allow me · e57b2d0e
      Randall Stewart authored
      to add BBR. These changes make it so you can get an
      array of timestamps instead of a compressed ack/data segment.
      BBR uses this to aid with its delivery estimates. We also
      now (via Drew's suggestions) will not go to the expense of
      the tcb lookup if no stack registers to want this feature. If
      HPTS is not present the feature is not present either and you
      just get the compressed behavior.
      
      Sponsored by:	Netflix Inc
      Differential Revision: https://reviews.freebsd.org/D21127
      e57b2d0e
  21. 09 Mar, 2018 1 commit
    • Sean Bruno's avatar
      Update tcp_lro with tested bugfixes from Netflix and LLNW: · d7fb35d1
      Sean Bruno authored
          rrs - Lets make the LRO code look for true dup-acks and window update acks
                fly on through and combine.
          rrs - Make the LRO engine a bit more aware of ack-only seq space. Lets not
                have it incorrectly wipe out newer acks for older acks when we have
                out-of-order acks (common in wifi environments).
          jeggleston - LRO eating window updates
      
      Based on all of the above I think we are RFC compliant doing it this way:
      
      https://tools.ietf.org/html/rfc1122
      
      section 4.2.2.16
      
      "Note that TCP has a heuristic to select the latest window update despite
      possible datagram reordering; as a result, it may ignore a window update with
      a smaller window than previously offered if neither the sequence number nor the
      acknowledgment number is increased."
      
      Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
      Reviewed by:	rstone gallatin
      Sponsored by:	NetFlix and Limelight Networks
      Differential Revision:	https://reviews.freebsd.org/D14540
      d7fb35d1
  22. 27 Nov, 2017 1 commit
    • Pedro F. Giffuni's avatar
      sys: general adoption of SPDX licensing ID tags. · fe267a55
      Pedro F. Giffuni authored
      Mainly focus on files that use BSD 2-Clause license, however the tool I
      was using misidentified many licenses so this was mostly a manual - error
      prone - task.
      
      The Software Package Data Exchange (SPDX) group provides a specification
      to make it easier for automated tools to detect and summarize well known
      opensource licenses. We are gradually adopting the specification, noting
      that the tags are considered only advisory and do not, in any way,
      superceed or replace the license texts.
      
      No functional change intended.
      fe267a55
  23. 24 Apr, 2017 2 commits
  24. 19 Apr, 2017 3 commits
  25. 25 Aug, 2016 1 commit
  26. 16 Aug, 2016 1 commit
  27. 05 Aug, 2016 1 commit
    • Sepherosa Ziehau's avatar
      tcp/lro: If timestamps mismatch or it's a FIN, force flush. · b9ec6f0b
      Sepherosa Ziehau authored
      This keeps the segments/ACK/FIN delivery order.
      
      Before this patch, it was observed: if A sent FIN immediately after
      an ACK, B would deliver FIN first to the TCP stack, then the ACK.
      This out-of-order delivery causes one unnecessary ACK sent from B.
      
      Reviewed by:	gallatin, hps
      Obtained from:  rrs, gallatin
      Sponsored by:	Netflix (rrs, gallatin), Microsoft (sephe)
      Differential Revision:	https://reviews.freebsd.org/D7415
      b9ec6f0b
  28. 02 Aug, 2016 1 commit
  29. 03 Jun, 2016 1 commit
    • Hans Petter Selasky's avatar
      Use insertion sort instead of bubble sort in TCP LRO. · ec668905
      Hans Petter Selasky authored
      Replacing the bubble sort with insertion sort gives an 80% reduction
      in runtime on average, with randomized keys, for small partitions.
      
      If the keys are pre-sorted, insertion sort runs in linear time, and
      even if the keys are reversed, insertion sort is faster than bubble
      sort, although not by much.
      
      Update comment describing "tcp_lro_sort()" while at it.
      
      Differential Revision:	https://reviews.freebsd.org/D6619
      Sponsored by:	Mellanox Technologies
      Tested by:	Netflix
      Suggested by:	Pieter de Goeje <pieter@degoeje.nl>
      Reviewed by:	ed, gallatin, gnn, transport
      ec668905
  30. 26 May, 2016 1 commit
    • Hans Petter Selasky's avatar
      Use optimised complexity safe sorting routine instead of the kernel's · fc271df3
      Hans Petter Selasky authored
      "qsort()".
      
      The kernel's "qsort()" routine can in worst case spend O(N*N) amount of
      comparisons before the input array is sorted. It can also recurse a
      significant amount of times using up the kernel's interrupt thread
      stack.
      
      The custom sorting routine takes advantage of that the sorting key is
      only 64 bits. Based on set and cleared bits in the sorting key it
      partitions the array until it is sorted. This process has a recursion
      limit of 64 times, due to the number of set and cleared bits which can
      occur. Compiled with -O2 the sorting routine was measured to use
      64-bytes of stack. Multiplying this by 64 gives a maximum stack
      consumption of 4096 bytes for AMD64. The same applies to the execution
      time, that the array to be sorted will not be traversed more than 64
      times.
      
      When serving roughly 80Gb/s with 80K TCP connections, the old method
      consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4%
      CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting
      as measured by Intel Vtune. The testing was done using a sysctl to
      toggle between "qsort()" and "tcp_lro_sort()".
      
      Differential Revision:	https://reviews.freebsd.org/D6472
      Sponsored by:	Mellanox Technologies
      Tested by:	Netflix
      Reviewed by:	gallatin, rrs, sephe, transport
      fc271df3
  31. 03 May, 2016 1 commit
  32. 28 Apr, 2016 1 commit
  33. 27 Apr, 2016 1 commit