1. 19 Apr, 2017 2 commits
  2. 25 Aug, 2016 1 commit
  3. 16 Aug, 2016 1 commit
  4. 05 Aug, 2016 1 commit
    • Sepherosa Ziehau's avatar
      tcp/lro: If timestamps mismatch or it's a FIN, force flush. · b9ec6f0b
      Sepherosa Ziehau authored
      This keeps the segments/ACK/FIN delivery order.
      
      Before this patch, it was observed: if A sent FIN immediately after
      an ACK, B would deliver FIN first to the TCP stack, then the ACK.
      This out-of-order delivery causes one unnecessary ACK sent from B.
      
      Reviewed by:	gallatin, hps
      Obtained from:  rrs, gallatin
      Sponsored by:	Netflix (rrs, gallatin), Microsoft (sephe)
      Differential Revision:	https://reviews.freebsd.org/D7415
      b9ec6f0b
  5. 02 Aug, 2016 1 commit
  6. 03 Jun, 2016 1 commit
    • Hans Petter Selasky's avatar
      Use insertion sort instead of bubble sort in TCP LRO. · ec668905
      Hans Petter Selasky authored
      Replacing the bubble sort with insertion sort gives an 80% reduction
      in runtime on average, with randomized keys, for small partitions.
      
      If the keys are pre-sorted, insertion sort runs in linear time, and
      even if the keys are reversed, insertion sort is faster than bubble
      sort, although not by much.
      
      Update comment describing "tcp_lro_sort()" while at it.
      
      Differential Revision:	https://reviews.freebsd.org/D6619
      Sponsored by:	Mellanox Technologies
      Tested by:	Netflix
      Suggested by:	Pieter de Goeje <pieter@degoeje.nl>
      Reviewed by:	ed, gallatin, gnn, transport
      ec668905
  7. 26 May, 2016 1 commit
    • Hans Petter Selasky's avatar
      Use optimised complexity safe sorting routine instead of the kernel's · fc271df3
      Hans Petter Selasky authored
      "qsort()".
      
      The kernel's "qsort()" routine can in worst case spend O(N*N) amount of
      comparisons before the input array is sorted. It can also recurse a
      significant amount of times using up the kernel's interrupt thread
      stack.
      
      The custom sorting routine takes advantage of that the sorting key is
      only 64 bits. Based on set and cleared bits in the sorting key it
      partitions the array until it is sorted. This process has a recursion
      limit of 64 times, due to the number of set and cleared bits which can
      occur. Compiled with -O2 the sorting routine was measured to use
      64-bytes of stack. Multiplying this by 64 gives a maximum stack
      consumption of 4096 bytes for AMD64. The same applies to the execution
      time, that the array to be sorted will not be traversed more than 64
      times.
      
      When serving roughly 80Gb/s with 80K TCP connections, the old method
      consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4%
      CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting
      as measured by Intel Vtune. The testing was done using a sysctl to
      toggle between "qsort()" and "tcp_lro_sort()".
      
      Differential Revision:	https://reviews.freebsd.org/D6472
      Sponsored by:	Mellanox Technologies
      Tested by:	Netflix
      Reviewed by:	gallatin, rrs, sephe, transport
      fc271df3
  8. 03 May, 2016 1 commit
  9. 28 Apr, 2016 1 commit
  10. 27 Apr, 2016 1 commit
  11. 01 Apr, 2016 2 commits
  12. 25 Mar, 2016 1 commit
  13. 18 Feb, 2016 1 commit
  14. 11 Feb, 2016 1 commit
  15. 01 Feb, 2016 1 commit
  16. 19 Jan, 2016 1 commit
    • Hans Petter Selasky's avatar
      Add optimizing LRO wrapper: · e936121d
      Hans Petter Selasky authored
      - Add optimizing LRO wrapper which pre-sorts all incoming packets
        according to the hash type and flowid. This prevents exhaustion of
        the LRO entries due to too many connections at the same time.
        Testing using a larger number of higher bandwidth TCP connections
        showed that the incoming ACK packet aggregation rate increased from
        ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
        connections greater than 16 per hardware receive ring, where 8 TCP
        connections was the LRO active entry limit, there was a significant
        improvement in throughput due to being able to fully aggregate more
        than 8 TCP stream. For very few very high bandwidth TCP streams, the
        optimizing LRO wrapper will add CPU usage instead of reducing CPU
        usage. This is expected. Network drivers which want to use the
        optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
        of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
        "tcp_lro_flush()". Further the LRO control structure must be
        initialized using "tcp_lro_init_args()" passing a non-zero number
        into the "lro_mbufs" argument.
      
      - Make LRO statistics 64-bit. Previously 32-bit integers were used for
        statistics which can be prone to wrap-around. Fix this while at it
        and update all SYSCTL's which expose LRO statistics.
      
      - Ensure all data is freed when destroying a LRO control structures,
        especially leftover LRO entries.
      
      - Reduce number of memory allocations needed when setting up a LRO
        control structure by precomputing the total amount of memory needed.
      
      - Add own memory allocation counter for LRO.
      
      - Bump the FreeBSD version to force recompilation of all KLDs due to
        change of the LRO control structure size.
      
      Sponsored by:	Mellanox Technologies
      Reviewed by:	gallatin, sbruno, rrs, gnn, transport
      Tested by:	Netflix
      Differential Revision:	https://reviews.freebsd.org/D4914
      e936121d
  17. 30 Jun, 2015 1 commit
  18. 28 Aug, 2013 1 commit
    • Navdeep Parhar's avatar
      Merge r254336 from user/np/cxl_tuning. · 7127e6ac
      Navdeep Parhar authored
      Add a last-modified timestamp to each LRO entry and provide an interface
      to flush all inactive entries.  Drivers decide when to flush and what
      the inactivity threshold should be.
      
      Network drivers that process an rx queue to completion can enter a
      livelock type situation when the rate at which packets are received
      reaches equilibrium with the rate at which the rx thread is processing
      them.  When this happens the final LRO flush (normally when the rx
      routine is done) does not occur.  Pure ACKs and segments with total
      payload < 64K can get stuck in an LRO entry.  Symptoms are that TCP
      tx-mostly connections' performance falls off a cliff during heavy,
      unrelated rx on the interface.
      
      Flushing only inactive LRO entries works better than any of these
      alternates that I tried:
      - don't LRO pure ACKs
      - flush _all_ LRO entries periodically (every 'x' microseconds or every
        'y' descriptors)
      - stop rx processing in the driver periodically and schedule remaining
        work for later.
      
      Reviewed by:	andre
      7127e6ac
  19. 21 Feb, 2013 1 commit
  20. 01 Jun, 2012 1 commit
  21. 26 May, 2012 1 commit
  22. 25 May, 2012 1 commit
    • Bjoern A. Zeeb's avatar
      In case forwarding is turned on for a given address family, refuse to · 31bfc56e
      Bjoern A. Zeeb authored
      queue the packet for LRO and tell the driver to directly pass it on.
      This avoids re-assembly and later re-fragmentation problems when
      forwarding.
      
      It's not the best solution but the simplest and most effective for
      the moment.
      
      Should have been done:	ages ago
      Discussed with and by:	many
      MFC after:		3 days
      31bfc56e
  23. 24 May, 2012 1 commit
    • Bjoern A. Zeeb's avatar
      MFp4 bz_ipv6_fast: · 62b5b6ec
      Bjoern A. Zeeb authored
        Significantly update tcp_lro for mostly two things:
        1) introduce basic support for IPv6 without extension headers.
        2) try hard to also get the incremental checksum updates right,
           especially also in the IPv4 case for the IP and TCP header.
      
        Move variables around for better locality, factor things out into
        functions, allow checksum updates to be compiled out, ...
      
        Leave a few comments on further things to look at in the future,
        though that is not the full list.
      
        Update drivers with appropriate #includes as needed for IPv6 data
        type in LRO.
      
        Sponsored by:	The FreeBSD Foundation
        Sponsored by:	iXsystems
      
      Reviewed by:	gnn (as part of the whole)
      MFC After:	3 days
      62b5b6ec
  24. 15 May, 2012 1 commit
  25. 05 Jul, 2011 1 commit
  26. 07 Apr, 2011 1 commit
  27. 07 Jan, 2011 1 commit
  28. 19 Oct, 2010 1 commit
  29. 19 Oct, 2008 1 commit
  30. 24 Aug, 2008 1 commit
  31. 11 Jun, 2008 1 commit
  32. 16 May, 2008 1 commit
    • Jack F Vogel's avatar
      This is driver version 1.4.4 of the Intel ixgbe driver. · 9ca4041b
      Jack F Vogel authored
        -It has new hardware support
        -It uses a new method of TX cleanup called Head Write Back
        -It includes the provisional generic TCP LRO feature contributed
         by Myricom and made general purpose by me. This should move into
         the stack upon approval but for this driver drop its in here.
        -Also bug fixes and etc...
      
      MFC in a week if no serious issues arise.
      9ca4041b
  33. 14 Feb, 2008 1 commit
  34. 15 Jan, 2008 1 commit
    • Andrew Gallatin's avatar
      Add optional support to mxge for MSI-X interrupts and multiple receive · 1e413cf9
      Andrew Gallatin authored
      queues (which we call slices).  The NIC will steer traffic into up to
      hw.mxge.max_slices different receive rings based on a configurable
      hash type (hw.mxge.rss_hash_type).
      
      Currently the driver defaults to using a single slice, so the default
      behavior is unchanged.  Also, transmit from non-zero slices is
      disabled currently.
      1e413cf9
  35. 12 Jul, 2007 1 commit
  36. 11 Jun, 2007 3 commits