- 19 Apr, 2017 2 commits
-
-
Navdeep Parhar authored
MFC after: 1 week
-
Navdeep Parhar authored
-
- 25 Aug, 2016 1 commit
-
-
Lawrence Stewart authored
tso_segsz pkthdr field during RX processing, and use the information in TCP for more correct accounting and as a congestion control input. This is only a start, and an audit of other uses for the data is left as future work. Reviewed by: gallatin, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D7564
-
- 16 Aug, 2016 1 commit
-
-
Sepherosa Ziehau authored
Reviewed by: hps, gallatin Obtained from: rrs, gallatin MFC after: 2 weeks Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D7499
-
- 05 Aug, 2016 1 commit
-
-
Sepherosa Ziehau authored
This keeps the segments/ACK/FIN delivery order. Before this patch, it was observed: if A sent FIN immediately after an ACK, B would deliver FIN first to the TCP stack, then the ACK. This out-of-order delivery causes one unnecessary ACK sent from B. Reviewed by: gallatin, hps Obtained from: rrs, gallatin Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D7415
-
- 02 Aug, 2016 1 commit
-
-
Sepherosa Ziehau authored
This significantly improves HTTP workload performance and reduces HTTP workload latency. Reviewed by: rrs, gallatin, hps Obtained from: rrs, gallatin Sponsored by: Netflix (rrs, gallatin) , Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D6689
-
- 03 Jun, 2016 1 commit
-
-
Hans Petter Selasky authored
Replacing the bubble sort with insertion sort gives an 80% reduction in runtime on average, with randomized keys, for small partitions. If the keys are pre-sorted, insertion sort runs in linear time, and even if the keys are reversed, insertion sort is faster than bubble sort, although not by much. Update comment describing "tcp_lro_sort()" while at it. Differential Revision: https://reviews.freebsd.org/D6619 Sponsored by: Mellanox Technologies Tested by: Netflix Suggested by: Pieter de Goeje <pieter@degoeje.nl> Reviewed by: ed, gallatin, gnn, transport
-
- 26 May, 2016 1 commit
-
-
Hans Petter Selasky authored
"qsort()". The kernel's "qsort()" routine can in worst case spend O(N*N) amount of comparisons before the input array is sorted. It can also recurse a significant amount of times using up the kernel's interrupt thread stack. The custom sorting routine takes advantage of that the sorting key is only 64 bits. Based on set and cleared bits in the sorting key it partitions the array until it is sorted. This process has a recursion limit of 64 times, due to the number of set and cleared bits which can occur. Compiled with -O2 the sorting routine was measured to use 64-bytes of stack. Multiplying this by 64 gives a maximum stack consumption of 4096 bytes for AMD64. The same applies to the execution time, that the array to be sorted will not be traversed more than 64 times. When serving roughly 80Gb/s with 80K TCP connections, the old method consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4% CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting as measured by Intel Vtune. The testing was done using a sysctl to toggle between "qsort()" and "tcp_lro_sort()". Differential Revision: https://reviews.freebsd.org/D6472 Sponsored by: Mellanox Technologies Tested by: Netflix Reviewed by: gallatin, rrs, sephe, transport
-
- 03 May, 2016 1 commit
-
-
Sepherosa Ziehau authored
Ease more work concerning active list, e.g. hash table etc. Reviewed by: gallatin, rrs (earlier version) Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6137
-
- 28 Apr, 2016 1 commit
-
-
Sepherosa Ziehau authored
Noticed by: hiren MFC after: 1 week Sponsored by: Microsoft OSTC
-
- 27 Apr, 2016 1 commit
-
-
Sepherosa Ziehau authored
MFC after: 1 week Sponsored by: Microsoft OSTC
-
- 01 Apr, 2016 2 commits
-
-
Sepherosa Ziehau authored
This is kinda critical to the performance when the CPU is slow and network bandwidth is high, e.g. in the hypervisor. Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5765
-
Sepherosa Ziehau authored
And factor out tcp_lro_rx_done, which deduplicates the same logic with netinet/tcp_lro.c Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5725
-
- 25 Mar, 2016 1 commit
-
-
Sepherosa Ziehau authored
So that callers could react accordingly. Reviewed by: gallatin (no objection) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5695
-
- 18 Feb, 2016 1 commit
-
-
Sepherosa Ziehau authored
ACK aggregation limit is append count based, while the TCP data segment aggregation limit is length based. Unless the network driver sets these two limits, it's an NO-OP. Reviewed by: adrian, gallatin (previous version), hselasky (previous version) Approved by: adrian (mentor) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5185
-
- 11 Feb, 2016 1 commit
-
-
Hans Petter Selasky authored
the sign bit doesn't cause an overflow. The overflow manifests itself as a sorting index wrap around in the middle of the sorted array, which is not a problem for the LRO code, but might be a problem for the logic inside qsort(). Reviewed by: gnn @ Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D5239
-
- 01 Feb, 2016 1 commit
-
-
Gleb Smirnoff authored
via sys/mbuf.h
-
- 19 Jan, 2016 1 commit
-
-
Hans Petter Selasky authored
- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument. - Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics. - Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries. - Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed. - Add own memory allocation counter for LRO. - Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size. Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914
-
- 30 Jun, 2015 1 commit
-
-
Navdeep Parhar authored
hanging off the header need to be freed too. Differential Revision: https://reviews.freebsd.org/D2708 Reviewed by: ae@, hiren@
-
- 28 Aug, 2013 1 commit
-
-
Navdeep Parhar authored
Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre
-
- 21 Feb, 2013 1 commit
-
-
Andrew Gallatin authored
Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload. Sponsored by: Myricom Inc. MFC after: 7 days
-
- 01 Jun, 2012 1 commit
-
-
Bjoern A. Zeeb authored
There's no VIMAGE context set there yet as this is before if_ethersubr.c. MFC after: 3 days X-MFC with: r235981
-
- 26 May, 2012 1 commit
-
-
Bjoern A. Zeeb authored
the __FBSDID() macro on the file now instead. MFC after: 3 days
-
- 25 May, 2012 1 commit
-
-
Bjoern A. Zeeb authored
queue the packet for LRO and tell the driver to directly pass it on. This avoids re-assembly and later re-fragmentation problems when forwarding. It's not the best solution but the simplest and most effective for the moment. Should have been done: ages ago Discussed with and by: many MFC after: 3 days
-
- 24 May, 2012 1 commit
-
-
Bjoern A. Zeeb authored
Significantly update tcp_lro for mostly two things: 1) introduce basic support for IPv6 without extension headers. 2) try hard to also get the incremental checksum updates right, especially also in the IPv4 case for the IP and TCP header. Move variables around for better locality, factor things out into functions, allow checksum updates to be compiled out, ... Leave a few comments on further things to look at in the future, though that is not the full list. Update drivers with appropriate #includes as needed for IPv6 data type in LRO. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
-
- 15 May, 2012 1 commit
-
-
Bjoern A. Zeeb authored
Approved by: Myricom Inc. (gallatin) Approved by: Intel Corporation (jfv)
-
- 05 Jul, 2011 1 commit
-
-
Colin Percival authored
when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss". This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2. Reviewed by: jfv MFC after: 2 weeks
-
- 07 Apr, 2011 1 commit
-
-
Jack F Vogel authored
LRO code. Thanks to Andrew Gallatin for the change. MFC after: 7 days
-
- 07 Jan, 2011 1 commit
-
-
John Baldwin authored
-
- 19 Oct, 2010 1 commit
-
-
Jamie Gritton authored
by /etc/rc.d/jail.
-
- 19 Oct, 2008 1 commit
-
-
Ulf Lilleengen authored
-
- 24 Aug, 2008 1 commit
-
-
Kip Macy authored
Obtained from: Chelsio Inc. MFC after: 3 days
-
- 11 Jun, 2008 1 commit
-
-
Jack F Vogel authored
-
- 16 May, 2008 1 commit
-
-
Jack F Vogel authored
-It has new hardware support -It uses a new method of TX cleanup called Head Write Back -It includes the provisional generic TCP LRO feature contributed by Myricom and made general purpose by me. This should move into the stack upon approval but for this driver drop its in here. -Also bug fixes and etc... MFC in a week if no serious issues arise.
-
- 14 Feb, 2008 1 commit
-
-
Andrew Gallatin authored
as RELENG_6 Sponsored by: Myricom, Inc.
-
- 15 Jan, 2008 1 commit
-
-
Andrew Gallatin authored
queues (which we call slices). The NIC will steer traffic into up to hw.mxge.max_slices different receive rings based on a configurable hash type (hw.mxge.rss_hash_type). Currently the driver defaults to using a single slice, so the default behavior is unchanged. Also, transmit from non-zero slices is disabled currently.
-
- 12 Jul, 2007 1 commit
-
-
Andrew Gallatin authored
the binary distribution clause. Approved by: re (bmah)
-
- 11 Jun, 2007 3 commits
-
-
Andrew Gallatin authored
to defeat the mtu check in ether_input. Mbuf flags are too scarce. Discussed with: sam
-
Andrew Gallatin authored
the MTU check in ether_input() on LRO merged frames. Discussed with: kmacy
-
Andrew Gallatin authored
- Allow LRO to be enabled / disabled at runtime - Fix a double-free at module unload time. - Only update timestamp in lro merge when it is present in the frame Sponsored by: Myricom
-