1. 16 Mar, 2021 2 commits
    • Mitchell Horne's avatar
      ns8250: don't drop IER_TXRDY on bus_grab/ungrab · a54c346f
      Mitchell Horne authored
      It has been observed that some systems are often unable to resume from
      ddb after entering with debug.kdb.enter=1. Checking the status further
      shows the terminal is blocked waiting in tty_drain(), but it never makes
      progress in clearing the output queue, because sc->sc_txbusy is high.
      
      I noticed that when entering polling mode for the debugger, IER_TXRDY is
      set in the failure case. Since this bit is never tracked by the softc,
      it will not be restored by ns8250_bus_ungrab(). This creates a race in
      which a TX interrupt can be lost, creating the hang described above.
      Ensuring that this bit is restored is enough to prevent this, and resume
      from ddb as expected.
      
      The solution is to track this bit in the sc->ier field, for the same
      lifetime that TX interrupts are enabled.
      
      PR:		223917, 240122
      Sponsored by:	The FreeBSD Foundation
      
      (cherry picked from commit 7e7f7bee)
      a54c346f
    • Kirk McKusick's avatar
      Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crash · cf0310df
      Kirk McKusick authored
      PR:           253158
      
      (cherry picked from commit 8563de2f)
      (cherry picked from commit c31480a1)
      cf0310df
  2. 15 Mar, 2021 28 commits
    • Juraj Lutter's avatar
      newsyslog(8): Implement a new 'E' flag to not rotate empty log files · ffdcad75
      Juraj Lutter authored
      Based on an idea from dvl's coworker, László DANIELISZ, implement
      a new flag, 'E', that prevents newsyslog(8) from rotating the empty
      log files. This 'E' flag ist mostly usable in conjunction with 'B'
      flag that instructs newsyslog(8) to not insert an informational
      message into the log file after rotation, keeping it still empty.
      
      Reviewed by:	markj, ian, manpages (rpokala)
      Approved by:	markj, ian, manpages (rpokala)
      MFC after:	2 weeks
      Differential Revision:	https://reviews.freebsd.org/D28940
      
      (cherry picked from commit c7d27b22)
      ffdcad75
    • Rick Macklem's avatar
      nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS · 960f07a4
      Rick Macklem authored
      During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
      was detected when doing I/O DS operations on a Flexible File Layout pNFS
      server.  For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
      instead of the ReadDS/WriteDS/CommitDS counts.
      This patch fixes this.
      
      Only the RPC counts reported by nfsstat(1) were affected by this bug,
      the I/O operations were performed correctly.
      
      MFC after:	2 weeks
      
      (cherry picked from commit c04199af)
      960f07a4
    • Rick Macklem's avatar
      nfsclient: Fix the stripe unit size for a File Layout pNFS layout · f419fd9a
      Rick Macklem authored
      During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
      was detected when doing a File Layout pNFS DS I/O operation.
      The size of the I/O operation was smaller than expected.
      The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
      in the layout.  I had misinterpreted RFC5661 and had shifted the value
      right by 6 bits. The correct interpretation is to use the value as
      presented (it is always an exact multiple of 64), clearing bits 0->5.
      This patch fixes this.
      
      Without the patch, I/O through the DSs work, but the I/O size is 1/64th
      of what is optimal.
      
      (cherry picked from commit 94f2e42f)
      f419fd9a
    • Rick Macklem's avatar
      nfsclient: add nfs node locking around uses of n_direofoffset · a1224bee
      Rick Macklem authored
      During code inspection I noticed that the n_direofoffset field
      of the NFS node was being manipulated without any lock being
      held to make it SMP safe.
      This patch adds locking of the NFS node's mutex around
      handling of n_direofoffset to make it SMP safe.
      
      I have not seen any failure that could be attributed to n_direofoffset
      being manipulated concurrently by multiple processors, but I think this
      is possible, since directories are read with shared vnode
      locking, plus locks only on individual buffer cache blocks.
      However, there have been as yet unexplained issues w.r.t reading
      large directories over NFS that could have conceivably been caused
      by concurrent manipulation of n_direofoffset.
      
      (cherry picked from commit 15bed8c4)
      a1224bee
    • Rick Macklem's avatar
      nfsclient: add checks for a server returning the current directory · b6b901f8
      Rick Macklem authored
      Commit 3fe2c68b dealt with a panic in cache_enter_time() where
      the vnode referred to the directory argument.
      It would also be possible to get these panics if a broken
      NFS server were to return the directory as an new object being
      created within the directory or in a Lookup reply.
      
      This patch adds checks to avoid the panics and logs
      messages to indicate that the server is broken for the
      file object creation cases.
      
      (cherry picked from commit 3e04ab36)
      b6b901f8
    • Rick Macklem's avatar
      nfsclient: fix panic in cache_enter_time() · c33a5277
      Rick Macklem authored
      Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
      cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
      function.
      This is specific to an NFSv3 mount with the "rdirplus" mount
      option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
      includes entries for the current directory.
      
      This trivial patch avoids doing a cache_enter_time()
      call for the current directory to avoid the panic.
      
      (cherry picked from commit 3fe2c68b)
      c33a5277
    • Konstantin Belousov's avatar
      fhlink(2): the syscalls do not take flag · 2428cc63
      Konstantin Belousov authored
      (cherry picked from commit 600756af)
      2428cc63
    • Konstantin Belousov's avatar
      Make kern.timecounter.hardware tunable · 791b89da
      Konstantin Belousov authored
      (cherry picked from commit 56b9bee6)
      791b89da
    • Wei Hu's avatar
      Hyper-V: hn: Relinquish cpu in HN_LOCK to avoid deadlock · 70f7b247
      Wei Hu authored
      The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
      is not available. It is possible to cause deadlock if the thread holding
      the lock is sleeping. Relinquish the cpu to work around this problem even
      it doesn't completely solve the issue. The priority inversion could cause
      the livelock no matter how less likely it could happen. A more complete
      solution may be needed in the future.
      
      Reported by:	Microsoft, Netapp
      MFC after:	2 weeks
      Sponsored by:	Microsoft
      
      (cherry picked from commit b3460f44)
      70f7b247
    • Wei Hu's avatar
      Hyper-V: pcib: Check revoke status during device attach · 27904109
      Wei Hu authored
      It is possible that the vmbus pcib channel is revoked during attach path.
      The attach path could be waiting for response from host and this response will never
      arrive since the channel has already been revoked from host point of view. Check
      this situation during wait complete and return failed if this happens.
      
      Reported by:	Netapp
      MFC after:	2 weeks
      Sponsored by:	Microsoft
      Differential Revision:	https://reviews.freebsd.org/D26486
      
      (cherry picked from commit 75c2786c)
      27904109
    • Wei Hu's avatar
      Hyper-V: storvsc: Enhance srb_status code handling. · 919a160f
      Wei Hu authored
      In hv_storvsc_io_request() when coring, prevent changing of the send channel
      from the base channel to another one. storvsc_poll always probes on the base
      channel.
      
      Based upon conversations with Microsoft, changed the handling of srb_status
      codes. Most we should never get, others yes. All are treated as retry-able
      except for two. We should not get these statuses, but if we ever do, the I/O
      state is not known.
      
      Submitted by:	Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
      Reviewed by:	trasz, allanjude, whu
      MFC after:	1 week
      Sponsored by:	Netapp Inc
      Differential Revision:	https://reviews.freebsd.org/D25756
      
      (cherry picked from commit 2a0ce39d)
      919a160f
    • Wei Hu's avatar
      hyperv/vmbus: Fix the wrong size in ndis_offload structure · b05c1dd0
      Wei Hu authored
      Submitted by:	whu
      MFC after:	2 weeks
      Sponsored by:	Microsoft
      
      (cherry picked from commit 23a49920)
      b05c1dd0
    • Wei Hu's avatar
      hyperv/vmbus: Update VMBus version 4.0 and 5.0 support. · c38b9b80
      Wei Hu authored
      Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.
      
      For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.
      
      Submitted by:	whu
      MFC after:	2 weeks
      Sponsored by:	Microsoft
      
      (cherry picked from commit ace5ce7e)
      c38b9b80
    • Wei Hu's avatar
      Prevent framebuffer mmio space from being allocated to other devices on HyperV. · e801c980
      Wei Hu authored
      On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
      This mmio address range is not useable for other PCI devices.
      Currently only efifb driver is using this range without reserving
      it from system.
      Therefore, vmbus driver reserves it before any other PCI device
      drivers start to request mmio addresses.
      
      PR:		222996
      Submitted by:	weh@microsoft.com
      Reported by:	dmitry_kuleshov@ukr.net
      Reviewed by:	decui@microsoft.com
      Sponsored by:	Microsoft
      
      (cherry picked from commit c5657761)
      e801c980
    • Alexander Motin's avatar
      Make software iSCSI more configurable. · 3ef86cd7
      Alexander Motin authored
      Move software iSCSI tunables/sysctls into kern.icl.soft subtree.
      Replace several hardcoded length constants there with variables.
      
      While there, stretch the limits to better match Linux' open-iscsi
      and our own initiator with new MAXPHYS of 1MB.  Our CTL target is
      also optimized for up to 1MB I/Os, so there is also a match now.
      For Windows 10 and VMware 6.7 initiators at default settings it
      should make no change, since previous limits were sufficient there.
      
      Tests of QD1 1MB writes from FreeBSD over 10GigE link show throughput
      increase by 29% on idle connection and 132% with concurrent QD8 reads.
      
      MFC after:	3 days
      Sponsored by:	iXsystems, Inc.
      
      (cherry picked from commit b75168ed)
      3ef86cd7
    • Alexander Motin's avatar
      Move ic_check_send_space clear to the actual check. · 6cd45427
      Alexander Motin authored
      It closes tiny race when the flag could be set between being cleared
      and the space is checked, that would create us some more work.  The
      flag setting is protected by both locks, so we can clear it in either
      place, but in between both locks are dropped.
      
      MFC after:	1 week
      
      (cherry picked from commit afc3e54e)
      6cd45427
    • Alexander Motin's avatar
      Restore condition removed in df3747c6. · 2cd7a99c
      Alexander Motin authored
      I think it allowed to avoid some TX thread wakeups while the socket
      buffer is full.  But add there another options if ic_check_send_space
      is set, which means socket just reported that new space appeared, so
      it may have sense to pull more data from ic_to_send for better TX
      coalescing.
      
      MFC after:	1 week
      
      (cherry picked from commit aff9b9ee)
      2cd7a99c
    • Alexander Motin's avatar
      Replace STAILQ_SWAP() with simpler STAILQ_CONCAT(). · cec95c50
      Alexander Motin authored
      Also remove stray STAILQ_REMOVE_AFTER(), not causing problems only
      because STAILQ_SWAP() fixed corrupted stqh_last.
      
      MFC after:	1 week
      
      (cherry picked from commit df3747c6)
      cec95c50
    • Alexander Motin's avatar
      Fix initiator panic after 6895f89f. · cb89ac5a
      Alexander Motin authored
      There are sessions without socket that are not disconnecting yet.
      
      MFC after:	3 weeks
      
      (cherry picked from commit 06e9c710)
      cb89ac5a
    • Alexander Motin's avatar
      Optimize TX coalescing by keeping pointer to last mbuf. · 3034c0da
      Alexander Motin authored
      Before m_cat() each time traversed through all the coalesced chain.
      
      MFC after:	1 week
      
      (cherry picked from commit b85a67f5)
      3034c0da
    • Alexander Motin's avatar
      Optimize out few extra memory accesses. · 748feb19
      Alexander Motin authored
      MFC after:	1 week
      
      (cherry picked from commit a59e2982)
      748feb19
    • Alexander Motin's avatar
      Micro-optimize OOA queue processing. · 7b4859b4
      Alexander Motin authored
      - Move ctl_get_cmd_entry() calls from every OOA traversal to when
        the requests first inserted, storing seridx in struct ctl_scsiio.
      - Move some checks out of the loop in ctl_check_ooa().
      - Replace checks for errors that can not happen with asserts.
      - Transpose ctl_serialize_table, so that any OOA traversal accessed
        only one row (cache line).  Compact it from enum to uint8_t.
      - Optimize static branch predictions in hottest places.
      
      Due to O(n) nature on deep LUN queues this can be the hottest code
      path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
      are good to have in reserve.  About 50% of CPU time here according
      to the profiles is now spent in two memory accesses per traversed
      request in OOA.
      
      Sponsored by:	iXsystems, Inc.
      MFC after:	2 weeks
      
      (cherry picked from commit 9d9fd8b7)
      7b4859b4
    • Alexander Motin's avatar
      Coalesce socket reads in software iSCSI. · 6469aab0
      Alexander Motin authored
      Instead of 2-4 socket reads per PDU this can do as low as one read
      per megabyte, dramatically reducing TCP overhead and lock contention.
      
      With this on iSCSI target I can write more than 4GB/s through a
      single connection.
      
      MFC after:	1 month
      
      (cherry picked from commit 6895f89f)
      6469aab0
    • Alexander Motin's avatar
      Fix build after 2c7dc6ba. · 4d5d50ed
      Alexander Motin authored
      MFC after:	1 month
      
      (cherry picked from commit c02a2875)
      4d5d50ed
    • Alexander Motin's avatar
      Refactor CTL datamove KPI. · c4a81e64
      Alexander Motin authored
       - Make frontends call unified CTL core method ctl_datamove_done()
      to report move completion.  It allows to reduce code duplication
      in differerent backends by accounting DMA time in common code.
       - Add to ctl_datamove_done() and be_move_done() callback samethr
      argument, reporting whether the callback is called in the same
      context as ctl_datamove().  It allows for some cases like iSCSI
      write with immediate data or camsim frontend write save one context
      switch, since we know that the context is sleepable.
       - Remove data_move_done() methods from struct ctl_backend_driver,
      unused since forever.
      
      MFC after:	 1 month
      
      (cherry picked from commit 2c7dc6ba)
      c4a81e64
    • Alexander Motin's avatar
      Microoptimize CTL I/O queues. · 2a99726f
      Alexander Motin authored
      Switch OOA queue from TAILQ to LIST and change its direction, so that
      we traverse it forward, not backward.  There is only one place where
      we really need other direction, and it is not critical.
      
      Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends.
      
      Replace few impossible conditions with assertions.
      
      MFC after:	1 month
      
      (cherry picked from commit 05d882b7)
      2a99726f
    • Alexander Motin's avatar
      Save context switch per I/O for iSCSI and IOCTL frontends. · cfd358d9
      Alexander Motin authored
      Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
      context instead of scheduling another thread just for that.  This call
      may sleep, that is not acceptable for some frontends like the original
      CAM/FC one, but iSCSI already has separate sleepable per-connection RX
      threads, and another thread scheduling is mostly just a waste of time.
      IOCTL frontend actually waits for the I/O completion in the caller
      thread, so the use of another thread for this has even less sense.
      
      With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
      to ZFS.
      
      MFC after:	1 month
      
      (cherry picked from commit 812c9f48)
      cfd358d9
    • Alexander Motin's avatar
      Move XPT_IMMEDIATE_NOTIFY handling out of periph lock. · 15fe13c8
      Alexander Motin authored
      It is a rare, but still better to not have lock dependencies.
      
      MFC after:	1 month
      
      (cherry picked from commit c67a2909)
      15fe13c8
  3. 11 Mar, 2021 1 commit
  4. 10 Mar, 2021 1 commit
    • Dimitry Andric's avatar
      Partially revert libcxxrt changes to avoid _Unwind_Exception change · 3ef7b71f
      Dimitry Andric authored
      After the recent cherry-picking of libcxxrt commits 0ee0dbfb and
      d2b3fadf, users reported that editors/libreoffice packages from the
      official package builders did not start anymore. It turns out that the
      combination of these commits subtly changes the ABI, requiring all
      applications that depend on internal details of struct _Unwind_Exception
      (available via unwind-arm.h and unwind-itanium.h) to be recompiled.
      
      However, the FreeBSD package builders always use -RELEASE jails, so
      these still use the old declaration of struct _Unwind_Exception, which
      is not entirely compatible. In particular, LibreOffice uses this struct
      in its internal "uno bridge" component, where it attempts to setup its
      own exception handling mechanism.
      
      To fix this incompatibility, go back to the old declarations of struct
      _Unwind_Exception, and restore the __LP64__ specific workaround we had
      in place before (which was to cope with yet another, older ABI bug).
      
      Effectively, this reverts upstream libcxxrt commits 88bdf6b290da
      ("Specify double-word alignment for ARM unwind") and b96169641f79
      ("Updated Itanium unwind"), and reapplies our commit 3c4fd246
      ("libcxxrt: add padding in __cxa_allocate_* to fix alignment").
      
      PR:		253840
      3ef7b71f
  5. 09 Mar, 2021 4 commits
  6. 08 Mar, 2021 4 commits
    • Warner Losh's avatar
      Move back the isa non-PNP driver deadline to FreeBSD 14. · f58d396c
      Warner Losh authored
      (cherry picked from commit 6ffdaa5f)
      f58d396c
    • Brandon Bergren's avatar
      [PowerPC] Allow traversal of oversize OF properties. · 6d7145a2
      Brandon Bergren authored
      In standards such as LoPAPR, property names in excess of the usual 31
      characters exist.
      
      This breaks property traversal.
      
      While in IEEE 1275-1994, nextprop is defined explicitly to work with a
      32-byte region of memory, using a larger buffer should be fine. There is
      actually no way to pass a buffer length to the nextprop call in the OF
      client interface, so SLOF actually just blindly overflows the buffer.
      
      So we have to defensively make the buffer larger, to avoid memory
      corruption when reading out long properties on live OF systems.
      
      Note also that on real-mode OF, things are pretty tight because we are
      allocating against a static bounce buffer in low memory, so we can't just
      use a huge buffer to work around this without it being wasteful of our
      limited amount of 32-bit physical memory.
      
      This allows a patched ofwdump to operate properly on SLOF (i.e. pseries)
      systems, as well as any other PowerPC systems with overlength properties.
      
      Reviewed by:	jhibbits
      Sponsored by:	Tag1 Consulting, Inc.
      Differential Revision:	https://reviews.freebsd.org/D26669
      
      (cherry picked from commit 26869ad1)
      6d7145a2
    • Brandon Bergren's avatar
      [PowerPC64] Fix multiple issues in fpsetmask(). · 015a3712
      Brandon Bergren authored
      Building R exposed a problem in fpsetmask() whereby we were not properly
      clamping the provided mask to the valid range.
      
      R initilizes the mask by calling fpsetmask(~0) on FreeBSD. Since we
      recently enabled precise exceptions, this was causing an immediate
      SIGFPE because we were attempting to set invalid bits in the fpscr.
      
      Properly limit the range of bits that can be set via fpsetmask().
      
      While here, use the correct fp_except_t type instead of fp_rnd_t.
      
      Reported by:	pkubaj (in IRC)
      Sponsored by:	Tag1 Consulting, Inc.
      
      (cherry picked from commit dd95b392)
      (cherry picked from commit a7973538)
      015a3712
    • Brandon Bergren's avatar
      [PowerPC] [PowerPCSPE] Fix multiple issues in fpsetmask(). · 1a4b9c28
      Brandon Bergren authored
      Building R on powerpc64 exposed a problem in fpsetmask() whereby we
      were not properly clamping the provided mask to the valid range.
      
      This same issue affects powerpc and powerpcspe.
      
      Properly limit the range of bits that can be set via fpsetmask().
      
      While here, use the correct fp_except_t type instead of fp_rnd_t.
      
      Reported by:	pkubaj, jhibbits (in IRC)
      Sponsored by:	Tag1 Consulting, Inc.
      
      (cherry picked from commit 384ee7cc)
      (cherry picked from commit 8b96d6ac)
      1a4b9c28