1. 28 Aug, 2021 1 commit
  2. 13 Jun, 2018 1 commit
  3. 27 Nov, 2017 1 commit
    • Pedro F. Giffuni's avatar
      sys: general adoption of SPDX licensing ID tags. · fe267a55
      Pedro F. Giffuni authored
      Mainly focus on files that use BSD 2-Clause license, however the tool I
      was using misidentified many licenses so this was mostly a manual - error
      prone - task.
      The Software Package Data Exchange (SPDX) group provides a specification
      to make it easier for automated tools to detect and summarize well known
      opensource licenses. We are gradually adopting the specification, noting
      that the tags are considered only advisory and do not, in any way,
      superceed or replace the license texts.
      No functional change intended.
  4. 31 Mar, 2016 1 commit
  5. 18 Jan, 2015 1 commit
    • Adrian Chadd's avatar
      Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific · b2bdc62a
      Adrian Chadd authored
      The motivation here is to eventually teach netisr and potentially
      other networking subsystems a bit more about how RSS work queues / buckets
      are configured so things have a hope of auto-configuring in the future.
      * net/rss_config.[ch] takes care of the generic bits for doing
        configuration, hash function selection, etc;
      * topelitz.[ch] is now in net/ rather than netinet/;
      * (and would be in libkern if it didn't directly include RSS_KEYSIZE;
        that's a later thing to fix up.)
      * netinet/in_rss.[ch] now just contains the IPv4 specific methods;
      * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.
      This should have no functional impact on anyone currently using
      the RSS support.
      Differential Revision:	D1383
      Reviewed by:	gnn, jfv (intel driver bits)
  6. 10 Sep, 2014 1 commit
  7. 20 Jul, 2014 1 commit
  8. 10 Jul, 2014 1 commit
    • Adrian Chadd's avatar
      Implement the first stage of multi-bind listen sockets and RSS socket · 0a100a6f
      Adrian Chadd authored
      * Introduce IP_BINDMULTI - indicating that it's okay to bind multiple
        sockets on the same bind details.
        Although the PCB code has been taught about this (see below) this patch
        doesn't introduce the rest of the PCB changes necessary to distribute
        lookups among multiple PCB entries in the global wildcard table.
      * Introduce IP_RSS_LISTEN_BUCKET - placing an listen socket into the
        given RSS bucket (and thus a single PCBGROUP hash.)
      * Modify the PCB add path to be aware of IP_BINDMULTI:
        + Only allow further PCB entries to be added if the owner credentials
          and IP_BINDMULTI has been specified.  Ie, only allow further
          IP_BINDMULTI sockets to appear if the first bind() was IP_BINDMULTI.
      * Teach the PCBGROUP code about IP_RSS_LISTE_BUCKET marked PCB entries.
        Instead of using the wildcard logic and hashing, these sockets are
        simply placed into the PCBGROUP and _not_ in the wildcard hash.
      * When doing a PCBGROUP lookup, also do a wildcard match as well.
        This allows for an RSS bucket PCB entry to appear in a PCBGROUP
        rather than having to exist in the wildcard list.
      * TCP IPv4 server testing with igb(4)
      * TCP IPv4 server testing with ix(4)
      * The pcbgroup lookup code duplicated the wildcard and wildcard-PCB
        logic.  This could be refactored into a single function.
      * This doesn't yet work for IPv6 (The PCBGROUP code in netinet6/ doesn't
        yet know about this); nor does it yet fully work for UDP.
  9. 15 Mar, 2014 1 commit
    • Robert Watson's avatar
      Several years after initial development, merge prototype support for · 7527624e
      Robert Watson authored
      linking NIC Receive Side Scaling (RSS) to the network stack's
      connection-group implementation.  This prototype (and derived patches)
      are in use at Juniper and several other FreeBSD-using companies, so
      despite some reservations about its maturity, merge the patch to the
      base tree so that it can be iteratively refined in collaboration rather
      than maintained as a set of gradually diverging patch sets.
      (1) Merge a software implementation of the Toeplitz hash specified in
          RSS implemented by David Malone.  This is used to allow suitable
          pcbgroup placement of connections before the first packet is
          received from the NIC.  Software hashing is generally avoided,
          however, due to high cost of the hash on general-purpose CPUs.
      (2) In in_rss.c, maintain authoritative versions of RSS state intended
          to be pushed to each NIC, including keying material, hash
          algorithm/ configuration, and buckets.  Provide software-facing
          interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
          the RSS standardised Toeplitz and a 'naive' variation with a hash
          efficient in software but with poor distribution properties.
          Implement rss_m2cpuid()to be used by netisr and other load
          balancing code to look up the CPU on which an mbuf should be
      (3) In the Ethernet link layer, allow netisr distribution using RSS as
          a source of policy as an alternative to source ordering; continue
          to default to direct dispatch (i.e., don't try and requeue packets
          for processing on the 'right' CPU if they arrive in a directly
          dispatchable context).
      (4) Allow RSS to control tuning of connection groups in order to align
          groups with RSS buckets.  If a packet arrives on a protocol using
          connection groups, and contains a suitable hardware-generated
          hash, use that hash value to select the connection group for pcb
          lookup for both IPv4 and IPv6.  If no hardware-generated Toeplitz
          hash is available, we fall back on regular PCB lookup risking
          contention rather than pay the cost of Toeplitz in software --
          this is a less scalable but, at my last measurement, faster
          approach.  As core counts go up, we may want to revise this
          strategy despite CPU overhead.
      Where device drivers suitably configure NICs, and connection groups /
      RSS are enabled, this should avoid both lock and line contention during
      connection lookup for TCP.  This commit does not modify any device
      drivers to tune device RSS configuration to the global RSS
      configuration; patches are in circulation to do this for at least
      Chelsio T3 and Intel 1G/10G drivers.  Currently, the KPI for device
      drivers is not particularly robust, nor aware of more advanced features
      such as runtime reconfiguration/rebalancing.  This will hopefully prove
      a useful starting point for refinement.
      No MFC is scheduled as we will first want to nail down a more mature
      and maintainable KPI/KBI for device drivers.
      Sponsored by:   Juniper Networks (original work)
      Sponsored by:   EMC/Isilon (patch update and merge)
  10. 06 Jun, 2011 1 commit
    • Robert Watson's avatar
      Implement a CPU-affine TCP and UDP connection lookup data structure, · 52cd27cb
      Robert Watson authored
      struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
      existing inpcbinfo connection hash table, which when pcbgroups are
      enabled, might now be thought of more usefully as a per-protocol
      4-tuple reservation table.
      Connections are assigned to connection groups base on a hash of their
      4-tuple; wildcard sockets require special handling, and are members
      of all connection groups.  During a connection lookup, a
      per-connection group lock is employed rather than the global pcbinfo
      lock.  By aligning connection groups with input path processing,
      connection groups take on an effective CPU affinity, especially when
      aligned with RSS work placement (see a forthcoming commit for
      details).  This eliminates cache line migration associated with
      global, protocol-layer data structures in steady state TCP and UDP
      processing (with the exception of protocol-layer statistics; further
      commit to follow).
      Elements of this approach were inspired by Willman, Rixner, and Cox's
      2006 USENIX paper, "An Evaluation of Network Stack Parallelization
      Strategies in Modern Operating Systems".  However, there are also
      significant differences: we maintain the inpcb lock, rather than using
      the connection group lock for per-connection state.
      Likewise, the focus of this implementation is alignment with NIC
      packet distribution strategies such as RSS, rather than pure software
      strategies.  Despite that focus, software distribution is supported
      through the parallel netisr implementation, and works well in
      configurations where the number of hardware threads is greater than
      the number of NIC input queues, such as in the RMI XLR threaded MIPS
      Another important difference is the continued maintenance of existing
      hash tables as "reservation tables" -- these are useful both to
      distinguish the resource allocation aspect of protocol name management
      and the more common-case lookup aspect.  In configurations where
      connection tables are aligned with hardware hashes, it is desirable to
      use the traditional lookup tables for loopback or encapsulated traffic
      rather than take the expense of hardware hashes that are hard to
      implement efficiently in software (such as RSS Toeplitz).
      Connection group support is enabled by compiling "options PCBGROUP"
      into your kernel configuration; for the time being, this is an
      experimental feature, and hence is not enabled by default.
      Subject to the limited MFCability of change dependencies in inpcb,
      and its change to the inpcbinfo init function signature, this change
      in principle could be merged to FreeBSD 8.x.
      Reviewed by:    bz
      Sponsored by:   Juniper Networks, Inc.