1. 15 Mar, 2021 18 commits
    • Mitchell Horne's avatar
      ns8250: don't drop IER_TXRDY on bus_grab/ungrab · 17d301f7
      Mitchell Horne authored
      It has been observed that some systems are often unable to resume from
      ddb after entering with debug.kdb.enter=1. Checking the status further
      shows the terminal is blocked waiting in tty_drain(), but it never makes
      progress in clearing the output queue, because sc->sc_txbusy is high.
      
      I noticed that when entering polling mode for the debugger, IER_TXRDY is
      set in the failure case. Since this bit is never tracked by the softc,
      it will not be restored by ns8250_bus_ungrab(). This creates a race in
      which a TX interrupt can be lost, creating the hang described above.
      Ensuring that this bit is restored is enough to prevent this, and resume
      from ddb as expected.
      
      The solution is to track this bit in the sc->ier field, for the same
      lifetime that TX interrupts are enabled.
      
      PR:		223917, 240122
      Sponsored by:	The FreeBSD Foundation
      
      (cherry picked from commit 7e7f7bee)
      17d301f7
    • Edward Tomasz Napierala's avatar
      development(7): update to reflect Git transition · d7ef665e
      Edward Tomasz Napierala authored
      Reviewed By:	debdrup, imp (earlier version)
      Sponsored By:	EPSRC
      Differential Revision:	https://reviews.freebsd.org/D28939
      
      (cherry picked from commit d28cbb79)
      d7ef665e
    • Edward Tomasz Napierala's avatar
      linux(4): make getcwd(2) return ERANGE instead of ENOMEM · ab1a91d9
      Edward Tomasz Napierala authored
      For native FreeBSD binaries, the return value from __getcwd(2)
      doesn't really matter, as the libc wrapper takes over and returns
      the proper errno.
      
      PR:		kern/254120
      Reported By:	Alex S <iwtcex@gmail.com>
      Reviewed By:	kib
      Sponsored By:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D29217
      
      (cherry picked from commit 0dfbdd9f)
      ab1a91d9
    • Konstantin Belousov's avatar
      Make kern.timecounter.hardware tunable · 0264d12e
      Konstantin Belousov authored
      (cherry picked from commit 56b9bee6)
      0264d12e
    • Alexander Motin's avatar
      Move ic_check_send_space clear to the actual check. · aae8e02d
      Alexander Motin authored
      It closes tiny race when the flag could be set between being cleared
      and the space is checked, that would create us some more work.  The
      flag setting is protected by both locks, so we can clear it in either
      place, but in between both locks are dropped.
      
      MFC after:	1 week
      
      (cherry picked from commit afc3e54e)
      aae8e02d
    • Alexander Motin's avatar
      Restore condition removed in df3747c6. · 8b0101da
      Alexander Motin authored
      I think it allowed to avoid some TX thread wakeups while the socket
      buffer is full.  But add there another options if ic_check_send_space
      is set, which means socket just reported that new space appeared, so
      it may have sense to pull more data from ic_to_send for better TX
      coalescing.
      
      MFC after:	1 week
      
      (cherry picked from commit aff9b9ee)
      8b0101da
    • Alexander Motin's avatar
      Replace STAILQ_SWAP() with simpler STAILQ_CONCAT(). · 2f77e281
      Alexander Motin authored
      Also remove stray STAILQ_REMOVE_AFTER(), not causing problems only
      because STAILQ_SWAP() fixed corrupted stqh_last.
      
      MFC after:	1 week
      
      (cherry picked from commit df3747c6)
      2f77e281
    • Alexander Motin's avatar
      Fix initiator panic after 6895f89f. · fb419e0f
      Alexander Motin authored
      There are sessions without socket that are not disconnecting yet.
      
      MFC after:	3 weeks
      
      (cherry picked from commit 06e9c710)
      fb419e0f
    • Alexander Motin's avatar
      Optimize TX coalescing by keeping pointer to last mbuf. · 6b409a8b
      Alexander Motin authored
      Before m_cat() each time traversed through all the coalesced chain.
      
      MFC after:	1 week
      
      (cherry picked from commit b85a67f5)
      6b409a8b
    • Alexander Motin's avatar
      Optimize out few extra memory accesses. · 76d8d700
      Alexander Motin authored
      MFC after:	1 week
      
      (cherry picked from commit a59e2982)
      76d8d700
    • Alexander Motin's avatar
      Micro-optimize OOA queue processing. · 8db80adb
      Alexander Motin authored
      - Move ctl_get_cmd_entry() calls from every OOA traversal to when
        the requests first inserted, storing seridx in struct ctl_scsiio.
      - Move some checks out of the loop in ctl_check_ooa().
      - Replace checks for errors that can not happen with asserts.
      - Transpose ctl_serialize_table, so that any OOA traversal accessed
        only one row (cache line).  Compact it from enum to uint8_t.
      - Optimize static branch predictions in hottest places.
      
      Due to O(n) nature on deep LUN queues this can be the hottest code
      path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
      are good to have in reserve.  About 50% of CPU time here according
      to the profiles is now spent in two memory accesses per traversed
      request in OOA.
      
      Sponsored by:	iXsystems, Inc.
      MFC after:	2 weeks
      
      (cherry picked from commit 9d9fd8b7)
      8db80adb
    • Alexander Motin's avatar
      Coalesce socket reads in software iSCSI. · 31d41a6a
      Alexander Motin authored
      Instead of 2-4 socket reads per PDU this can do as low as one read
      per megabyte, dramatically reducing TCP overhead and lock contention.
      
      With this on iSCSI target I can write more than 4GB/s through a
      single connection.
      
      MFC after:	1 month
      
      (cherry picked from commit 6895f89f)
      31d41a6a
    • Alexander Motin's avatar
      Fix build after 2c7dc6ba. · 26953f59
      Alexander Motin authored
      MFC after:	1 month
      
      (cherry picked from commit c02a2875)
      26953f59
    • Alexander Motin's avatar
      Refactor CTL datamove KPI. · a4bea2f2
      Alexander Motin authored
       - Make frontends call unified CTL core method ctl_datamove_done()
      to report move completion.  It allows to reduce code duplication
      in differerent backends by accounting DMA time in common code.
       - Add to ctl_datamove_done() and be_move_done() callback samethr
      argument, reporting whether the callback is called in the same
      context as ctl_datamove().  It allows for some cases like iSCSI
      write with immediate data or camsim frontend write save one context
      switch, since we know that the context is sleepable.
       - Remove data_move_done() methods from struct ctl_backend_driver,
      unused since forever.
      
      MFC after:	 1 month
      
      (cherry picked from commit 2c7dc6ba)
      a4bea2f2
    • Alexander Motin's avatar
      Microoptimize CTL I/O queues. · d0b1f461
      Alexander Motin authored
      Switch OOA queue from TAILQ to LIST and change its direction, so that
      we traverse it forward, not backward.  There is only one place where
      we really need other direction, and it is not critical.
      
      Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.
      
      Replace few impossible conditions with assertions.
      
      MFC after:	1 month
      
      (cherry picked from commit 05d882b7)
      d0b1f461
    • Alexander Motin's avatar
      Save context switch per I/O for iSCSI and IOCTL frontends. · 6fb753b9
      Alexander Motin authored
      Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
      context instead of scheduling another thread just for that.  This call
      may sleep, that is not acceptable for some frontends like the original
      CAM/FC one, but iSCSI already has separate sleepable per-connection RX
      threads, and another thread scheduling is mostly just a waste of time.
      IOCTL frontend actually waits for the I/O completion in the caller
      thread, so the use of another thread for this has even less sense.
      
      With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
      to ZFS.
      
      MFC after:	1 month
      
      (cherry picked from commit 812c9f48)
      6fb753b9
    • Alexander Motin's avatar
      Move XPT_IMMEDIATE_NOTIFY handling out of periph lock. · 90ac5cb7
      Alexander Motin authored
      It is a rare, but still better to not have lock dependencies.
      
      MFC after:	1 month
      
      (cherry picked from commit c67a2909)
      90ac5cb7
    • Juraj Lutter's avatar
      newsyslog(8): Implement a new 'E' flag to not rotate empty log files · 78619cc2
      Juraj Lutter authored
      Based on an idea from dvl's coworker, László DANIELISZ, implement
      a new flag, 'E', that prevents newsyslog(8) from rotating the empty
      log files. This 'E' flag ist mostly usable in conjunction with 'B'
      flag that instructs newsyslog(8) to not insert an informational
      message into the log file after rotation, keeping it still empty.
      
      Reviewed by:	markj, ian, manpages (rpokala)
      Approved by:	markj, ian, manpages (rpokala)
      MFC after:	2 weeks
      Differential Revision:	https://reviews.freebsd.org/D28940
      
      (cherry picked from commit c7d27b22)
      78619cc2
  2. 14 Mar, 2021 12 commits
  3. 13 Mar, 2021 5 commits
    • Alexander V. Chernikov's avatar
      Flush remaining routes from the routing table during VNET shutdown. · 8aafa7a0
      Alexander V. Chernikov authored
      Summary:
      This fixes rtentry leak for the cloned interfaces created inside the
       VNET.
      
      Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
      Thus, any route table operations are too late to schedule.
      As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
      It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
      
      Test Plan:
      ```
      set_skip:set_skip_group_lo  ->  passed  [0.053s]
      tail -n 200 /var/log/messages | grep rtentry
      ```
      
      PR:	253998
      Reported by:	rashey at superbox.pl
      Reviewed By: kp
      Differential Revision: https://reviews.freebsd.org/D29116
      
      (cherry picked from commit b1d63265)
      8aafa7a0
    • Alexander V. Chernikov's avatar
      Fix various NOINET* builds broken by 145bf6c0. · d81b3bb4
      Alexander V. Chernikov authored
      Reported by:	mjg, bdragon
      
      (cherry picked from commit 8ca99aec)
      d81b3bb4
    • Alexander V. Chernikov's avatar
      Fix blackhole/reject routes. · 3489286a
      Alexander V. Chernikov authored
      Traditionally *BSD routing stack required to supply some
       interface data for blackhole/reject routes. This lead to
       varieties of hacks in routing daemons when inserting such routes.
      With the recent routeing stack changes, gateway sockaddr without
       RTF_GATEWAY started to be treated differently, purely as link
       identifier.
      
      This change broke net/bird, which installs blackhole routes with
       127.0.0.1 gateway without RTF_GATEWAY flags.
      
      Fix this by automatically constructing necessary gateway data at
       rtsock level if RTF_REJECT/RTF_BLACKHOLE is set.
      
      Reported by:	Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
      Reviewed by:	donner
      
      (cherry picked from commit 145bf6c0)
      3489286a
    • Dimitry Andric's avatar
      Partially revert libcxxrt changes to avoid _Unwind_Exception change · 0b452906
      Dimitry Andric authored
      (Note I am also applying this to main and stable/13, to restore the old
      libcxxrt ABI and to avoid having to maintain a compat library.)
      
      After the recent cherry-picking of libcxxrt commits 0ee0dbfb and
      d2b3fadf, users reported that editors/libreoffice packages from the
      official package builders did not start anymore. It turns out that the
      combination of these commits subtly changes the ABI, requiring all
      applications that depend on internal details of struct _Unwind_Exception
      (available via unwind-arm.h and unwind-itanium.h) to be recompiled.
      
      However, the FreeBSD package builders always use -RELEASE jails, so
      these still use the old declaration of struct _Unwind_Exception, which
      is not entirely compatible. In particular, LibreOffice uses this struct
      in its internal "uno bridge" component, where it attempts to setup its
      own exception handling mechanism.
      
      To fix this incompatibility, go back to the old declarations of struct
      _Unwind_Exception, and restore the __LP64__ specific workaround we had
      in place before (which was to cope with yet another, older ABI bug).
      
      Effectively, this reverts upstream libcxxrt commits 88bdf6b290da
      ("Specify double-word alignment for ARM unwind") and b96169641f79
      ("Updated Itanium unwind"), and reapplies our commit 3c4fd246
      ("libcxxrt: add padding in __cxa_allocate_* to fix alignment").
      
      PR:		253840
      0b452906
    • Konstantin Belousov's avatar
      Restore AT_RESOLVE_BENEATH support for funlinkat(2)/unlinkat(2). · 62694ac4
      Konstantin Belousov authored
      (cherry picked from commit ead7697f)
      62694ac4
  4. 12 Mar, 2021 5 commits
    • Jamie Gritton's avatar
      MFC jail: Don't allow jails under dying parents · d2bbfc37
      Jamie Gritton authored
      If a jail is created with jail_set(...JAIL_DYING), and it has a parent
      currently in a dying state, that will bring the parent jail back to
      life.  Restrict that to require that the parent itself be explicitly
      brought back first, and not implicitly created along with the new
      child jail.
      
      Differential Revision:	https://reviews.freebsd.org/D28515
      
      (cherry picked from commit 0a2a96f3)
      
      MFC jail: Fix locking on an early jail_set error.
      
      I had locked allprison_lock without immediately setting PD_LIST_LOCKED.
      
      (cherry picked from commit 108a9384)
      d2bbfc37
    • Jamie Gritton's avatar
      MFC jail: Add PD_KILL to remove a prison in prison_deref(). · 24633953
      Jamie Gritton authored
      Add the PD_KILL flag that instructs prison_deref() to take steps
      to actively kill a prison and its descendents, namely marking it
      PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
      attached processes.
      
      This replaces a similar loop in sys_jail_remove(), bringing the
      operation under the same single hold on allprison_lock that it already
      has. It is also used to clean up failed jail (re-)creations in
      kern_jail_set(), which didn't generally take all the proper steps.
      
      Differential Revision:  https://reviews.freebsd.org/D28473
      
      (cherry picked from commit 811e27fa)
      
      MFC jail: back out 811e27fa until it doesn't break Jenkins
      
      Reported by:	arichardson
      
      (cherry picked from commit ddfffb41)
      
      MFC jail: re-commit 811e27fa with fixes
      
      Make sure PD_KILL isn't passed to do_jail_attach, where it might end
      up trying to kill the caller's prison (even prison0).
      
      Fix the child jail loop in prison_deref_kill, which was doing the
      post-order part during the pre-order part.  That's not a system-
      killer, but make jails not always die correctly.
      
      (cherry picked from commit c861373b)
      
      MFC jail: Add safety around prison_deref() flags.
      
      do_jail_attach() now only uses the PD_XXX flags that refer to lock
      status, so make sure that something else like PD_KILL doesn't slip
      through.
      
      Add a KASSERT() in prison_deref() to catch any further PD_KILL misuse.
      
      (cherry picked from commit 589e4c1d)
      24633953
    • Jamie Gritton's avatar
      MFC jail: Add pr_state to struct prison · 2bfecbef
      Jamie Gritton authored
      Rather that using references (pr_ref and pr_uref) to deduce the state
      of a prison, keep track of its state explicitly.  A prison is either
      "invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
      (pr_uref == 0).
      
      State transitions are generally tied to the reference counts, but with
      some flexibility: a new prison is "invalid" even though it now starts
      with a reference, and jail_remove(2) sets the state to "dying" before
      the user reference count drops to zero (which was prviously
      accomplished via the PR_REMOVE flag).
      
      pr_state is protected by both the prison mutex and allprison_lock, so
      it has the same availablity guarantees as the reference counts do.
      
      Differential Revision:	https://reviews.freebsd.org/D27876
      
      (cherry picked from commit 1158508a)
      
      MFC jail: Fix a LOR introduced in 1158508a
      
      (cherry picked from commit 701d6b50)
      2bfecbef
    • Kyle Evans's avatar
      x86: tsc: deprioritize TSC on VirtualBox · ec24f78e
      Kyle Evans authored
      Misbehavior has been observed with TSC under VirtualBox, where threads
      doing small sleeps (~1 second) may miss their wake up and hang around
      in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
      fixes it, so stop using TSC on VirtualBox at least for the time being.
      
      This partially reverts 84eaf2cc, applying it only to VirtualBox and
      increasing the quality to 0. Negative qualities can never be chosen and
      cannot be chosen with the tunable recently added. If we do not have a
      timecounter with a higher quality than 0, then TSC does at least leave
      the system mostly usable.
      
      PR:		253087
      
      (cherry picked from commit 8cc15b0d)
      ec24f78e
    • Jamie Gritton's avatar
      MFC jail: Change the locking around pr_ref and pr_uref · ad259c47
      Jamie Gritton authored
      Require both the prison mutex and allprison_lock when pr_ref or
      pr_uref go to/from zero.  Adding a non-first or removing a non-last
      reference remain lock-free.  This means that a shared hold on
      allprison_lock is sufficient for prison_isalive() to be useful, which
      removes a number of cases of lock/check/unlock on the prison mutex.
      
      Expand the locking in kern_jail_set() to keep allprison_lock held
      exclusive until the new prison is valid, thus making invalid prisons
      invisible to any thread holding allprison_lock (except of course the
      one creating or destroying the prison).  This renders prison_isvalid()
      nearly redundant, now used only in asserts.
      
      Differential Revision:	https://reviews.freebsd.org/D28419
      Differential Revision:	https://reviews.freebsd.org/D28458
      
      (cherry picked from commit f7496dca)
      
      MFC jail: fix build after the previous commit
      Noted by: Michael Butler <imb protected-networks.net>
      
      (cherry picked from commit ee9b37ae)
      ad259c47