Skip to content
  • Hans Petter Selasky's avatar
    Use optimised complexity safe sorting routine instead of the kernel's · fc271df3
    Hans Petter Selasky authored
    "qsort()".
    
    The kernel's "qsort()" routine can in worst case spend O(N*N) amount of
    comparisons before the input array is sorted. It can also recurse a
    significant amount of times using up the kernel's interrupt thread
    stack.
    
    The custom sorting routine takes advantage of that the sorting key is
    only 64 bits. Based on set and cleared bits in the sorting key it
    partitions the array until it is sorted. This process has a recursion
    limit of 64 times, due to the number of set and cleared bits which can
    occur. Compiled with -O2 the sorting routine was measured to use
    64-bytes of stack. Multiplying this by 64 gives a maximum stack
    consumption of 4096 bytes for AMD64. The same applies to the execution
    time, that the array to be sorted will not be traversed more than 64
    times.
    
    When serving roughly 80Gb/s with 80K TCP connections, the old method
    consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4%
    CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting
    as measured by Intel Vtune. The testing was done using a sysctl to
    toggle between "qsort()" and "tcp_lro_sort()".
    
    Differential Revision:	https://reviews.freebsd.org/D6472
    Sponsored by:	Mellanox Technologies
    Tested by:	Netflix
    Reviewed by:	gallatin, rrs, sephe, transport
    fc271df3