1. 14 Mar, 2021 1 commit
    • Mark Johnston's avatar
      vm_reserv: Fix list locking in vm_reserv_reclaim_contig() · cec3990d
      Mark Johnston authored
      The per-domain partpop queue is locked by the combination of the
      per-domain lock and individual reservation mutexes.
      vm_reserv_reclaim_contig() scans the queue looking for partially
      populated reservations that can be reclaimed in order to satisfy the
      caller's allocation.
      
      During the scan, we drop the per-domain lock.  At this point, the rvn
      pointer may be invalidated.  Take care to load rvn after re-acquiring
      the per-domain lock.
      
      While here, simplify the condition used to check whether a reservation
      was dequeued while the per-domain lock was dropped.
      
      Reviewed by:	alc, kib
      Reported by:	gallatin
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D29203
      
      (cherry picked from commit 968079f2)
      cec3990d
  2. 13 Mar, 2021 5 commits
    • Alexander V. Chernikov's avatar
      Flush remaining routes from the routing table during VNET shutdown. · 8aafa7a0
      Alexander V. Chernikov authored
      Summary:
      This fixes rtentry leak for the cloned interfaces created inside the
       VNET.
      
      Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
      Thus, any route table operations are too late to schedule.
      As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
      It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
      
      Test Plan:
      ```
      set_skip:set_skip_group_lo  ->  passed  [0.053s]
      tail -n 200 /var/log/messages | grep rtentry
      ```
      
      PR:	253998
      Reported by:	rashey at superbox.pl
      Reviewed By: kp
      Differential Revision: https://reviews.freebsd.org/D29116
      
      (cherry picked from commit b1d63265)
      8aafa7a0
    • Alexander V. Chernikov's avatar
      Fix various NOINET* builds broken by 145bf6c0. · d81b3bb4
      Alexander V. Chernikov authored
      Reported by:	mjg, bdragon
      
      (cherry picked from commit 8ca99aec)
      d81b3bb4
    • Alexander V. Chernikov's avatar
      Fix blackhole/reject routes. · 3489286a
      Alexander V. Chernikov authored
      Traditionally *BSD routing stack required to supply some
       interface data for blackhole/reject routes. This lead to
       varieties of hacks in routing daemons when inserting such routes.
      With the recent routeing stack changes, gateway sockaddr without
       RTF_GATEWAY started to be treated differently, purely as link
       identifier.
      
      This change broke net/bird, which installs blackhole routes with
       127.0.0.1 gateway without RTF_GATEWAY flags.
      
      Fix this by automatically constructing necessary gateway data at
       rtsock level if RTF_REJECT/RTF_BLACKHOLE is set.
      
      Reported by:	Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
      Reviewed by:	donner
      
      (cherry picked from commit 145bf6c0)
      3489286a
    • Dimitry Andric's avatar
      Partially revert libcxxrt changes to avoid _Unwind_Exception change · 0b452906
      Dimitry Andric authored
      (Note I am also applying this to main and stable/13, to restore the old
      libcxxrt ABI and to avoid having to maintain a compat library.)
      
      After the recent cherry-picking of libcxxrt commits 0ee0dbfb and
      d2b3fadf, users reported that editors/libreoffice packages from the
      official package builders did not start anymore. It turns out that the
      combination of these commits subtly changes the ABI, requiring all
      applications that depend on internal details of struct _Unwind_Exception
      (available via unwind-arm.h and unwind-itanium.h) to be recompiled.
      
      However, the FreeBSD package builders always use -RELEASE jails, so
      these still use the old declaration of struct _Unwind_Exception, which
      is not entirely compatible. In particular, LibreOffice uses this struct
      in its internal "uno bridge" component, where it attempts to setup its
      own exception handling mechanism.
      
      To fix this incompatibility, go back to the old declarations of struct
      _Unwind_Exception, and restore the __LP64__ specific workaround we had
      in place before (which was to cope with yet another, older ABI bug).
      
      Effectively, this reverts upstream libcxxrt commits 88bdf6b290da
      ("Specify double-word alignment for ARM unwind") and b96169641f79
      ("Updated Itanium unwind"), and reapplies our commit 3c4fd246
      ("libcxxrt: add padding in __cxa_allocate_* to fix alignment").
      
      PR:		253840
      0b452906
    • Konstantin Belousov's avatar
      Restore AT_RESOLVE_BENEATH support for funlinkat(2)/unlinkat(2). · 62694ac4
      Konstantin Belousov authored
      (cherry picked from commit ead7697f)
      62694ac4
  3. 12 Mar, 2021 10 commits
    • Jamie Gritton's avatar
      MFC jail: Don't allow jails under dying parents · d2bbfc37
      Jamie Gritton authored
      If a jail is created with jail_set(...JAIL_DYING), and it has a parent
      currently in a dying state, that will bring the parent jail back to
      life.  Restrict that to require that the parent itself be explicitly
      brought back first, and not implicitly created along with the new
      child jail.
      
      Differential Revision:	https://reviews.freebsd.org/D28515
      
      (cherry picked from commit 0a2a96f3)
      
      MFC jail: Fix locking on an early jail_set error.
      
      I had locked allprison_lock without immediately setting PD_LIST_LOCKED.
      
      (cherry picked from commit 108a9384)
      d2bbfc37
    • Jamie Gritton's avatar
      MFC jail: Add PD_KILL to remove a prison in prison_deref(). · 24633953
      Jamie Gritton authored
      Add the PD_KILL flag that instructs prison_deref() to take steps
      to actively kill a prison and its descendents, namely marking it
      PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
      attached processes.
      
      This replaces a similar loop in sys_jail_remove(), bringing the
      operation under the same single hold on allprison_lock that it already
      has. It is also used to clean up failed jail (re-)creations in
      kern_jail_set(), which didn't generally take all the proper steps.
      
      Differential Revision:  https://reviews.freebsd.org/D28473
      
      (cherry picked from commit 811e27fa)
      
      MFC jail: back out 811e27fa until it doesn't break Jenkins
      
      Reported by:	arichardson
      
      (cherry picked from commit ddfffb41)
      
      MFC jail: re-commit 811e27fa with fixes
      
      Make sure PD_KILL isn't passed to do_jail_attach, where it might end
      up trying to kill the caller's prison (even prison0).
      
      Fix the child jail loop in prison_deref_kill, which was doing the
      post-order part during the pre-order part.  That's not a system-
      killer, but make jails not always die correctly.
      
      (cherry picked from commit c861373b)
      
      MFC jail: Add safety around prison_deref() flags.
      
      do_jail_attach() now only uses the PD_XXX flags that refer to lock
      status, so make sure that something else like PD_KILL doesn't slip
      through.
      
      Add a KASSERT() in prison_deref() to catch any further PD_KILL misuse.
      
      (cherry picked from commit 589e4c1d)
      24633953
    • Jamie Gritton's avatar
      MFC jail: Add pr_state to struct prison · 2bfecbef
      Jamie Gritton authored
      Rather that using references (pr_ref and pr_uref) to deduce the state
      of a prison, keep track of its state explicitly.  A prison is either
      "invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
      (pr_uref == 0).
      
      State transitions are generally tied to the reference counts, but with
      some flexibility: a new prison is "invalid" even though it now starts
      with a reference, and jail_remove(2) sets the state to "dying" before
      the user reference count drops to zero (which was prviously
      accomplished via the PR_REMOVE flag).
      
      pr_state is protected by both the prison mutex and allprison_lock, so
      it has the same availablity guarantees as the reference counts do.
      
      Differential Revision:	https://reviews.freebsd.org/D27876
      
      (cherry picked from commit 1158508a)
      
      MFC jail: Fix a LOR introduced in 1158508a
      
      (cherry picked from commit 701d6b50)
      2bfecbef
    • Kyle Evans's avatar
      x86: tsc: deprioritize TSC on VirtualBox · ec24f78e
      Kyle Evans authored
      Misbehavior has been observed with TSC under VirtualBox, where threads
      doing small sleeps (~1 second) may miss their wake up and hang around
      in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
      fixes it, so stop using TSC on VirtualBox at least for the time being.
      
      This partially reverts 84eaf2cc, applying it only to VirtualBox and
      increasing the quality to 0. Negative qualities can never be chosen and
      cannot be chosen with the tunable recently added. If we do not have a
      timecounter with a higher quality than 0, then TSC does at least leave
      the system mostly usable.
      
      PR:		253087
      
      (cherry picked from commit 8cc15b0d)
      ec24f78e
    • Jamie Gritton's avatar
      MFC jail: Change the locking around pr_ref and pr_uref · ad259c47
      Jamie Gritton authored
      Require both the prison mutex and allprison_lock when pr_ref or
      pr_uref go to/from zero.  Adding a non-first or removing a non-last
      reference remain lock-free.  This means that a shared hold on
      allprison_lock is sufficient for prison_isalive() to be useful, which
      removes a number of cases of lock/check/unlock on the prison mutex.
      
      Expand the locking in kern_jail_set() to keep allprison_lock held
      exclusive until the new prison is valid, thus making invalid prisons
      invisible to any thread holding allprison_lock (except of course the
      one creating or destroying the prison).  This renders prison_isvalid()
      nearly redundant, now used only in asserts.
      
      Differential Revision:	https://reviews.freebsd.org/D28419
      Differential Revision:	https://reviews.freebsd.org/D28458
      
      (cherry picked from commit f7496dca)
      
      MFC jail: fix build after the previous commit
      Noted by: Michael Butler <imb protected-networks.net>
      
      (cherry picked from commit ee9b37ae)
      ad259c47
    • Jamie Gritton's avatar
      MFC jail: Improve locking when removing prisons · fe6b360a
      Jamie Gritton authored
      Change the flow of prison_deref() so it doesn't let go of allprison_lock
      until it's completely done using it (except for a possible drop as part
      of an upgrade on its first try).
      
      Differential Revision:	https://reviews.freebsd.org/D28458
      
      (cherry picked from commit 6e1d1bfc)
      fe6b360a
    • Mark Johnston's avatar
      opencrypto: Make cryptosoft attach silently · 9788aa5e
      Mark Johnston authored
      cryptosoft is always present and doesn't print any useful information
      when it attaches.
      
      Reviewed by:	jhb
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D29098
      
      (cherry picked from commit 4fc60fa9)
      9788aa5e
    • Mark Johnston's avatar
      netmap: Stop printing a line to the dmesg in netmap_init() · 2b0aa583
      Mark Johnston authored
      netmap is compiled into the kernel by default so initialization was
      always reported, and netmap uses a formatting convention not used in the
      rest of the kernel.
      
      Reviewed by:	vmaffione
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D29099
      
      (cherry picked from commit fef84509)
      2b0aa583
    • Mark Johnston's avatar
      ktls: Hide initialization message behind bootverbose · 5e7c99c0
      Mark Johnston authored
      We don't typically print anything when a subsystem initializes itself,
      and KTLS is currently disabled by default anyway.
      
      Reviewed by:	jhb
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D29097
      
      (cherry picked from commit 89b65087)
      5e7c99c0
    • Mark Johnston's avatar
      acpi: Make nexus_acpi quiet on amd64 and i386 · e46427de
      Mark Johnston authored
      Otherwise during attach newbus prints "nexus0", which is not very
      useful.
      
      The generic nexus device is already quiet, as is nexus_acpi on arm64.
      
      Sponsored by:	The FreeBSD Foundation
      
      (cherry picked from commit 732b69c9)
      e46427de
  4. 11 Mar, 2021 8 commits
  5. 10 Mar, 2021 16 commits
    • Alexander V. Chernikov's avatar
      Enforce net epoch in in6_selectsrc(). · 9cd7f222
      Alexander V. Chernikov authored
      in6_selectsrc() may call fib6_lookup() in some cases, which requires
       epoch. Wrap in6_selectsrc* calls into epoch inside its users.
      Mark it as requiring epoch by adding NET_EPOCH_ASSERT().
      
      Differential Revision:	https://reviews.freebsd.org/D28647
      
      (cherry picked from commit 605284b8)
      9cd7f222
    • Alexander V. Chernikov's avatar
      Fix dpdk/ldradix fib lookup algorithm preference calculation. · 8a25d3f6
      Alexander V. Chernikov authored
      The current preference number were copied from IPv4 code,
       assuming 500k routes to be the full-view. Adjust with the current
       reality (100k full-view).
      
      Reported by:	Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
      
      (cherry picked from commit d5be41be)
      8a25d3f6
    • Alexander V. Chernikov's avatar
      Fix setting static entries for arp/ndp. · 5b64694c
      Alexander V. Chernikov authored
      rtsock message validation changes committed in 2fe5a794
       did not take llinfo messages into account.
      
      Add a special validation case for RTA_GATEWAY llinfo messages.
      
      (cherry picked from commit e5b394f2)
      5b64694c
    • Alexander V. Chernikov's avatar
      Fix arp/ndp deletion broken by 2fe5a794. · 22f24233
      Alexander V. Chernikov authored
      Changes in the 2fe5a794 moved dst sockaddr masking from the
       routing control plane to the rtsock code.
      
      It broke arp/ndp deletion.
      It turns out, arp/ndp perform RTM_GET request first to get an
       interface index necessary for the deletion.
      Then they simply stamp the reply with RTF_LLDATA and set the
       command to RTM_DELETE.
      As a result, kernel receives request with non-empty RTA_NETMASK
       and clears RTA_DST host bits before passing the message to the
       lla code.
      
      De facto, the only needed bits are RTA_DST, RTA_GATEWAY and the
       subset of rtm_flags.
      
      With that in mind, fix the interace by clearing RTA_NETMASK
       for every messages with RTF_LLDATA.
      
      While here, cleanup arp/ndp code a bit.
      
      Reviewed by:	gnn
      Differential Revision:	https://reviews.freebsd.org/D28804
      
      (cherry picked from commit f9e1cd6c)
      22f24233
    • Alexander V. Chernikov's avatar
      Fix NOINET6 build broken by 2fe5a794. · 6c3f613b
      Alexander V. Chernikov authored
      Reported by:	mjg
      
      (cherry picked from commit a4513bac)
      6c3f613b
    • Alexander V. Chernikov's avatar
      Fix dst/netmask handling in routing socket code. · e1bdecd9
      Alexander V. Chernikov authored
      Traditionally routing socket code did almost zero checks on
       the input message except for the most basic size checks.
      
      This resulted in the unclear KPI boundary for the routing system code
       (`rtrequest*` and now `rib_action()`) w.r.t message validness.
      
      Multiple potential problems and nuances exists:
      * Host bits in RTAX_DST sockaddr. Existing applications do send prefixes
       with hostbits uncleared. Even `route(8)` does this, as they hope the kernel
       would do the job of fixing it. Code inside `rib_action()` needs to handle
       it on its own (see `rt_maskedcopy()` ugly hack).
      * There are multiple way of adding the host route: it can be DST without
       netmask or DST with /32(/128) netmask. Also, RTF_HOST has to be set correspondingly.
       Currently, these 2 options create 2 DIFFERENT routes in the kernel.
      * no sockaddr length/content checking for the "secondary" fields exists: nothing
       stops rtsock application to send sockaddr_in with length of 25 (instead of 16).
       Kernel will accept it, install to RIB as is and propagate to all rtsock consumers,
       potentially triggering bugs in their code. Same goes for sin_port, sin_zero, etc.
      
      The goal of this change is to make rtsock verify all sockaddr and prefix consistency.
      Said differently, `rib_action()` or internals should NOT require to change any of the
       sockaddrs supplied by `rt_addrinfo` structure due to incorrectness.
      
      To be more specific, this change implements the following:
      * sockaddr cleanup/validation check is added immediately after getting sockaddrs from rtm.
      * Per-family dst/netmask checks clears host bits in dst and zeros all dst/netmask "secondary" fields.
      * The same netmask checking code converts /32(/128) netmasks to "host" route case
       (NULL netmask, RTF_HOST), removing the dualism.
      * Instead of allowing ANY "known" sockaddr families (0<..<AF_MAX), allow only actually
       supported ones (inet, inet6, link).
      * Automatically convert `sockaddr_sdl` (AF_LINK) gateways to
        `sockaddr_sdl_short`.
      
      Reported by:	Guy Yur <guyyur at gmail.com>
      Reviewed By:	donner
      Differential Revision: https://reviews.freebsd.org/D28668
      
      (cherry picked from commit 2fe5a794)
      e1bdecd9
    • Alexander V. Chernikov's avatar
      Add ifa_try_ref() to simplify ifa handling inside epoch. · d9bcd8e7
      Alexander V. Chernikov authored
      More and more code migrates from lock-based protection to the NET_EPOCH
       umbrella. It requires some logic changes, including, notably, refcount
       handling.
      
      When we have an `ifa` pointer and we're running inside epoch we're
       guaranteed that this pointer will not be freed.
      However, the following case can still happen:
       * in thread 1 we drop to 0 refcount for ifa and schedule its deletion.
       * in thread 2 we use this ifa and reference it
       * destroy callout kicks in
       * unhappy user reports bug
      
      To address it, new `ifa_try_ref()` function is added, allowing to return
       failure when we try to reference `ifa` with 0 refcount.
      Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide
       cleaner error in such scenarious.
      
      Reviewed By: rstone, donner
      Differential Revision: https://reviews.freebsd.org/D28639
      
      (cherry picked from commit 600eade2)
      d9bcd8e7
    • Alexander V. Chernikov's avatar
      Make in_localip_more() fib-aware. · f6764167
      Alexander V. Chernikov authored
      It fixes loopback route installation for the interfaces
       in the different fibs using the same prefix.
      
      Reviewed By:	donner
      PR:		189088
      Differential Revision: https://reviews.freebsd.org/D28673
      
      (cherry picked from commit 9fdbf7ee)
      f6764167
    • Alexander V. Chernikov's avatar
      Remove per-packet ifa refcounting from IPv6 fast path. · 3f241e7a
      Alexander V. Chernikov authored
      Currently ip6_input() calls in6ifa_ifwithaddr() for
       every local packet, in order to check if the target ip
       belongs to the local ifa in proper state and increase
       its counters.
      
      in6ifa_ifwithaddr() references found ifa.
      With epoch changes, both `ip6_input()` and all other current callers
       of `in6ifa_ifwithaddr()` do not need this reference
       anymore, as epoch provides stability guarantee.
      
      Given that, update `in6ifa_ifwithaddr()` to allow
       it to return ifa without referencing it, while preserving
       option for getting referenced ifa if so desired.
      
      Differential Revision:	https://reviews.freebsd.org/D28648
      
      (cherry picked from commit 8268d82c)
      3f241e7a
    • Alexander V. Chernikov's avatar
      Remove now-unused RTF_RNH_LOCKED route flag. · 4904fbfc
      Alexander V. Chernikov authored
      (cherry picked from commit 64d5c277)
      4904fbfc
    • Alexander V. Chernikov's avatar
      Do not reference returned ifa in in6_ifawithifp(). · 3cff9b2c
      Alexander V. Chernikov authored
      The only place where in6_ifawithifp() is used is ip6_output(),
       which uses the returned ifa to bump traffic counters.
      Given ifa stability guarantees is provided by epoch, do not refcount ifa.
      
      This eliminates 2 atomic ops from IPv6 fast path.
      
      Reviewed By:	rstone
      Differential Revision:	https://reviews.freebsd.org/D28649
      
      (cherry picked from commit 1bd44b11)
      3cff9b2c
    • Emmanuel Vadot's avatar
      backlight(8): Add note that with option it print the current brightness. · f21c0366
      Emmanuel Vadot authored
      MFC after:    3 days
      PR: 	      253737
      
      (cherry picked from commit 1df30489)
      f21c0366
    • David Schlachter's avatar
      backlight: Fix incr/decr with percent value of 0 · 9ba393f2
      David Schlachter authored
      This now does nothing instead of incr/decr by 10%
      
      MFC After:    3 days
      PR: 	      253736
      
      (cherry picked from commit 3b005d51)
      9ba393f2
    • Martin Matuska's avatar
      zfs: update openzfs version reference to bedbc13d · 29805b31
      Martin Matuska authored
      It was missed in the latest merge.
      
      (cherry picked from commit 6781b8a3)
      29805b31
    • Martin Matuska's avatar
      zfs: merge OpenZFS master-bedbc13d · f2d3322c
      Martin Matuska authored
      Notable upstream commits:
        8e43fa12 Fix vdev_rebuild_thread deadlock
        03ef8f09 Add missing checks for unsupported features
        2e160dee Fix assert in FreeBSD-specific dmu_read_pages
        bedbc13d Cancel TRIM / initialize on FAULTED non-writeable vdevs
      
      Obtained from:	OpenZFS
      
      (cherry picked from commit caed7b1c)
      f2d3322c
    • Greg V's avatar
      openzfs: attach pam_zfs_key to build · 603f1c3d
      Greg V authored
      This PAM module allows unlocking encrypted user home datasets when
      logging in (and changing passphrase when changing the account password),
      see https://github.com/openzfs/zfs/pull/9903
      
      Also supposed to unload the key when the last session for the user is
      done, but there are EBUSY issues:
      https://github.com/openzfs/zfs/issues/11222#issuecomment-731897858
      
      Submitted by:	Greg V <greg_unrelenting.technology>
      Reviewed by:	mm
      Differential Revision:	https://reviews.freebsd.org/D28018
      
      (cherry picked from commit ee21ee15)
      603f1c3d