1. 02 Dec, 2021 2 commits
    • Gleb Smirnoff's avatar
      Remove "options PCBGROUP" · 93c67567
      Gleb Smirnoff authored
      With upcoming changes to the inpcb synchronisation it is going to be
      broken. Even its current status after the move of PCB synchronization
      to the network epoch is very questionable.
      
      This experimental feature was sponsored by Juniper but ended never to
      be used in Juniper and doesn't exist in their source tree [sjg@, stevek@,
      jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix
      [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].
      
      I'm up to resurrecting it back if there is any interest from anybody.
      
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33020
      93c67567
    • Gleb Smirnoff's avatar
      Allow to compile RSS without PCBGROUP. · 1cec1c58
      Gleb Smirnoff authored
      Reviewed by:		rrs
      Differential revision:	https://reviews.freebsd.org/D33019
      1cec1c58
  2. 01 Sep, 2020 1 commit
  3. 26 Feb, 2020 1 commit
    • Pawel Biernacki's avatar
      Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) · 7029da5c
      Pawel Biernacki authored
      r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
      still not MPSAFE (or already are but aren’t properly marked).
      Use it in preparation for a general review of all nodes.
      
      This is non-functional change that adds annotations to SYSCTL_NODE and
      SYSCTL_PROC nodes using one of the soon-to-be-required flags.
      
      Mark all obvious cases as MPSAFE.  All entries that haven't been marked
      as MPSAFE before are by default marked as NEEDGIANT
      
      Approved by:	kib (mentor, blanket)
      Commented by:	kib, gallatin, melifaro
      Differential Revision:	https://reviews.freebsd.org/D23718
      7029da5c
  4. 11 Oct, 2017 1 commit
  5. 03 May, 2016 1 commit
  6. 29 Aug, 2015 1 commit
  7. 28 Aug, 2015 1 commit
  8. 18 Jan, 2015 1 commit
    • Adrian Chadd's avatar
      Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific · b2bdc62a
      Adrian Chadd authored
      bits.
      
      The motivation here is to eventually teach netisr and potentially
      other networking subsystems a bit more about how RSS work queues / buckets
      are configured so things have a hope of auto-configuring in the future.
      
      * net/rss_config.[ch] takes care of the generic bits for doing
        configuration, hash function selection, etc;
      * topelitz.[ch] is now in net/ rather than netinet/;
      * (and would be in libkern if it didn't directly include RSS_KEYSIZE;
        that's a later thing to fix up.)
      * netinet/in_rss.[ch] now just contains the IPv4 specific methods;
      * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.
      
      This should have no functional impact on anyone currently using
      the RSS support.
      
      Differential Revision:	D1383
      Reviewed by:	gnn, jfv (intel driver bits)
      b2bdc62a
  9. 31 Dec, 2014 1 commit
    • Adrian Chadd's avatar
      Migrate the RSS IPv6 hash code to use pointers to the v6 addresses · 492ccbe1
      Adrian Chadd authored
      rather than passing them in by value.
      
      The eventual aim is to do incremental hash construction rather than
      all of the memcpy()'ing into a contiguous buffer for the hash
      function, which does show up as taking quite a bit of CPU during
      profiling.
      
      Tested:
      
      * a variety of laptops/desktop setups I have, with v6 connectivity
      
      Differential Revision:	D1404
      Reviewed by:	bz, rpaulo
      492ccbe1
  10. 01 Dec, 2014 1 commit
    • Hans Petter Selasky's avatar
      Start process of removing the use of the deprecated "M_FLOWID" flag · c2529042
      Hans Petter Selasky authored
      from the FreeBSD network code. The flag is still kept around in the
      "sys/mbuf.h" header file, but does no longer have any users. Instead
      the "m_pkthdr.rsstype" field in the mbuf structure is now used to
      decide the meaning of the "m_pkthdr.flowid" field. To modify the
      "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
      macros as defined in the "sys/mbuf.h" header file.
      
      This patch introduces new behaviour in the transmit direction.
      Previously network drivers checked if "M_FLOWID" was set in "m_flags"
      before using the "m_pkthdr.flowid" field. This check has now now been
      replaced by checking if "M_HASHTYPE_GET(m)" is different from
      "M_HASHTYPE_NONE". In the future more hashtypes will be added, for
      example hashtypes for hardware dedicated flows.
      
      "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
      valid and has no particular type. This change removes the need for an
      "if" statement in TCP transmit code checking for the presence of a
      valid flowid value. The "if" statement mentioned above is now a direct
      variable assignment which is then later checked by the respective
      network drivers like before.
      
      Additional notes:
      - The SCTP code changes will be committed as a separate patch.
      - Removal of the "M_FLOWID" flag will also be done separately.
      - The FreeBSD version has been bumped.
      
      MFC after:	1 month
      Sponsored by:	Mellanox Technologies
      c2529042
  11. 16 Sep, 2014 1 commit
  12. 09 Sep, 2014 1 commit
    • Adrian Chadd's avatar
      Implement IPv4 RSS software hash functions to use during packet ingress · 72d33245
      Adrian Chadd authored
      and egress.
      
      * rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details
        + direction to calculate a hash.
      * rss_proto_software_hash_v4 - hash the given source/destination IPv4 address,
        port and direction.
      * rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now)
      
      These functions are intended to be used by the stack to support
      the following:
      
      * Not all NICs do RSS hashing, so we should support some way of doing
        a hash in software;
      * The NIC / driver may not hash frames the way we want (eg UDP 4-tuple
        hashing when the stack is only doing 2-tuple hashing for UDP); so we
        may need to re-hash frames;
      * .. same with IPv4 fragments - they will need to be re-hashed after
        reassembly;
      * .. and same with things like IP tunneling and such;
      * The transmit path for things like UDP, RAW and ICMP don't currently
        have any RSS information attached to them - so they'll need an
        RSS calculation performed before transmit.
      
      TODO:
      
      * Counters! Everywhere!
      * Add a debug mode that software hashes received frames and compares them
        to the hardware hash provided by the hardware to ensure they match.
      
      The IPv6 part of this is missing - I'm going to do some re-juggling of
      where various parts of the RSS framework live before I add the IPv6
      code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch],
      rather than living here.)
      
      Note: This API is still fluid.  Please keep that in mind.
      
      Differential Revision:	https://reviews.freebsd.org/D527
      Reviewed by:	grehan
      72d33245
  13. 01 Aug, 2014 1 commit
    • Peter Grehan's avatar
      Fix byte ordering in default RSS key. · 07b4e383
      Peter Grehan authored
      The rss_key[] array in netinet/in_rss.c has the bytes in incorrect
      order. This results in the RSS test vectors in the Microsft RSS spec
      and Intel NIC specs giving incorrect results, and making it difficult
      to verify correct hash operation when RSS functionality is added to
      new NICs.
      
      CR:		https://phabric.freebsd.org/D516
      Reviewed by:	adrian
      07b4e383
  14. 20 Jul, 2014 3 commits
    • Adrian Chadd's avatar
      Add hash awareness of the IPv4 and IPv6 UDP 4-tuple. · 9870806c
      Adrian Chadd authored
      Note: it would be nice if the supported hash check would be used here!
      9870806c
    • Adrian Chadd's avatar
      Implement rss_gethashconfig() - return the currently supported hash methods · 40c753e3
      Adrian Chadd authored
      by the stack.
      
      Right now the stack isn't really setup for RSS with 4-tuple UDP hashing
      for either IPv4 and IPv6.
      
      The specifics:
      
      * The UDP init path udp_init() and udplite_init() specify the hash as
        2-tuple, so the PCBGROUPS code only tries a 2-tuple check;
      * The PCBGROUPS and RSS code doesn't know about the UDP hash types
        just yet, so they're never treated as valid hashes.
      * For correctness, 4-tuple can't be enabled in the general case because
        UDP datagrams can be more fragmented than IP datagrams may be.
      
      Strictly speaking, TCP datagrams may also be fragmented and this could
      cause issues with PCBGROUPS/RSS until the IP defragment path grows some
      code to re-calculate the RSS hash.
      
      I'll follow this commit up with awareness of the UDP 4-tuple for those
      who wish to configure it, but for now it'll stay disabled.
      
      No drivers (yet) know to use this function when RSS is enabled.
      40c753e3
    • Adrian Chadd's avatar
      Update the comment to be more concise. · 85415b47
      Adrian Chadd authored
      85415b47
  15. 18 Jul, 2014 1 commit
  16. 12 Jul, 2014 1 commit
  17. 28 Jun, 2014 1 commit
  18. 27 Jun, 2014 2 commits
    • Glen Barber's avatar
      Revert r267961, r267973: · 37a107a4
      Glen Barber authored
      These changes prevent sysctl(8) from returning proper output,
      such as:
      
       1) no output from sysctl(8)
       2) erroneously returning ENOMEM with tools like truss(1)
          or uname(1)
       truss: can not get etype: Cannot allocate memory
      37a107a4
    • Hans Petter Selasky's avatar
      Extend the meaning of the CTLFLAG_TUN flag to automatically check if · 3da1cf1e
      Hans Petter Selasky authored
      there is an environment variable which shall initialize the SYSCTL
      during early boot. This works for all SYSCTL types both statically and
      dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
      which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
      be used in the case a tunable sysctl has a custom initialisation
      function allowing the sysctl to still be marked as a tunable. The
      kernel SYSCTL API is mostly the same, with a few exceptions for some
      special operations like iterating childrens of a static/extern SYSCTL
      node. This operation should probably be made into a factored out
      common macro, hence some device drivers use this. The reason for
      changing the SYSCTL API was the need for a SYSCTL parent OID pointer
      and not only the SYSCTL parent OID list pointer in order to quickly
      generate the sysctl path. The motivation behind this patch is to avoid
      parameter loading cludges inside the OFED driver subsystem. Instead of
      adding special code to the OFED driver subsystem to post-load tunables
      into dynamically created sysctls, we generalize this in the kernel.
      
      Other changes:
      - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
      to "hw.pcic.intr_mask".
      - Removed redundant TUNABLE statements throughout the kernel.
      - Some minor code rewrites in connection to removing not needed
      TUNABLE statements.
      - Added a missing SYSCTL_DECL().
      - Wrapped two very long lines.
      - Avoid malloc()/free() inside sysctl string handling, in case it is
      called to initialize a sysctl from a tunable, hence malloc()/free() is
      not ready when sysctls from the sysctl dataset are registered.
      - Bumped FreeBSD version to indicate SYSCTL API change.
      
      MFC after:	2 weeks
      Sponsored by:	Mellanox Technologies
      3da1cf1e
  19. 26 Jun, 2014 1 commit
    • Adrian Chadd's avatar
      Add another RSS method to query the indirection table entries. · a6c88ec4
      Adrian Chadd authored
      There's 128 indirection table entries which correspond to the
      low 7 bits of the 32 bit RSS hash.  Each value will correspond
      to an RSS bucket.  (Then each RSS bucket currently will map
      to a CPU.)
      
      This is a more explicit way of figuring out which RSS bucket
      is in each RSS indirection slot.  It can be inferred by the other
      methods but I'd rather drivers use something more simplified and
      explicit.
      a6c88ec4
  20. 27 May, 2014 1 commit
    • Adrian Chadd's avatar
      The users of RSS shouldn't be directly concerned about hash -> CPU ID · 8bde802a
      Adrian Chadd authored
      mappings.  Instead, they should be first mapping to an RSS bucket and
      then querying the RSS bucket -> CPU ID mapping to figure out the target
      CPU.
      
      When (if?) RSS rebalancing is implemented or some other (non round-robin)
      distribution of work from buckets to CPU IDs, various bits of code - both
      userland and kernel - will need to know how this mapping works.
      
      So, to support this:
      
      * Add a new function rss_m2bucket() - this maps an mbuf to a given bucket.
        Anything which is currently doing hash -> CPU work may instead wish to
        do hash -> bucket, and then query the bucket->cpuid map for which
        CPU it belongs on.  Or, map it to a bucket, then re-pin that bucket ->
        CPU during a rebalance operation.
      
      * For userland applications which wish to exploit affinity to RSS buckets,
        the bucket -> CPU ID mapping is now available via a sysctl.
        net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via
        a list of bucket:cpu pairs.
      8bde802a
  21. 22 May, 2014 1 commit
  22. 18 May, 2014 1 commit
  23. 15 Mar, 2014 1 commit
    • Robert Watson's avatar
      Several years after initial development, merge prototype support for · 7527624e
      Robert Watson authored
      linking NIC Receive Side Scaling (RSS) to the network stack's
      connection-group implementation.  This prototype (and derived patches)
      are in use at Juniper and several other FreeBSD-using companies, so
      despite some reservations about its maturity, merge the patch to the
      base tree so that it can be iteratively refined in collaboration rather
      than maintained as a set of gradually diverging patch sets.
      
      (1) Merge a software implementation of the Toeplitz hash specified in
          RSS implemented by David Malone.  This is used to allow suitable
          pcbgroup placement of connections before the first packet is
          received from the NIC.  Software hashing is generally avoided,
          however, due to high cost of the hash on general-purpose CPUs.
      
      (2) In in_rss.c, maintain authoritative versions of RSS state intended
          to be pushed to each NIC, including keying material, hash
          algorithm/ configuration, and buckets.  Provide software-facing
          interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
          the RSS standardised Toeplitz and a 'naive' variation with a hash
          efficient in software but with poor distribution properties.
          Implement rss_m2cpuid()to be used by netisr and other load
          balancing code to look up the CPU on which an mbuf should be
          processed.
      
      (3) In the Ethernet link layer, allow netisr distribution using RSS as
          a source of policy as an alternative to source ordering; continue
          to default to direct dispatch (i.e., don't try and requeue packets
          for processing on the 'right' CPU if they arrive in a directly
          dispatchable context).
      
      (4) Allow RSS to control tuning of connection groups in order to align
          groups with RSS buckets.  If a packet arrives on a protocol using
          connection groups, and contains a suitable hardware-generated
          hash, use that hash value to select the connection group for pcb
          lookup for both IPv4 and IPv6.  If no hardware-generated Toeplitz
          hash is available, we fall back on regular PCB lookup risking
          contention rather than pay the cost of Toeplitz in software --
          this is a less scalable but, at my last measurement, faster
          approach.  As core counts go up, we may want to revise this
          strategy despite CPU overhead.
      
      Where device drivers suitably configure NICs, and connection groups /
      RSS are enabled, this should avoid both lock and line contention during
      connection lookup for TCP.  This commit does not modify any device
      drivers to tune device RSS configuration to the global RSS
      configuration; patches are in circulation to do this for at least
      Chelsio T3 and Intel 1G/10G drivers.  Currently, the KPI for device
      drivers is not particularly robust, nor aware of more advanced features
      such as runtime reconfiguration/rebalancing.  This will hopefully prove
      a useful starting point for refinement.
      
      No MFC is scheduled as we will first want to nail down a more mature
      and maintainable KPI/KBI for device drivers.
      
      Sponsored by:   Juniper Networks (original work)
      Sponsored by:   EMC/Isilon (patch update and merge)
      7527624e