1. 27 Oct, 2021 1 commit
  2. 07 Oct, 2021 1 commit
  3. 22 May, 2021 1 commit
  4. 17 Nov, 2020 1 commit
  5. 23 Sep, 2020 1 commit
  6. 18 Sep, 2020 1 commit
  7. 10 Sep, 2020 1 commit
    • Konstantin Belousov's avatar
      Fix interaction between largepages and seals/writes. · 79783634
      Konstantin Belousov authored
      On write with SHM_GROW_ON_WRITE, use proper truncate.
      Do not allow to grow largepage shm if F_SEAL_GROW is set. Note that
      shrinks are not supported at all due to unmanaged mappings.
      Call to vm_pager_update_writecount() is only valid for swap objects,
      skip it for unmanaged largepages.
      Largepages cannot support write sealing.
      Do not writecnt largepage mappings.
      
      Reported by:	kevans
      Reviewed by:	kevans, markj
      Sponsored by:	The FreeBSD Foundation
      MFC after:	1 week
      Differential revision:	https://reviews.freebsd.org/D26394
      79783634
  8. 09 Sep, 2020 2 commits
  9. 01 Sep, 2020 1 commit
  10. 31 Aug, 2020 1 commit
    • Kyle Evans's avatar
      posixshm: fix setting of shm_flags · 5dd47b52
      Kyle Evans authored
      Noted in D24652, we currently set shmfd->shm_flags on every
      shm_open()/shm_open2(). This wasn't properly thought out; one shouldn't be
      able to specify incompatible flags on subsequent opens of non-anon shm.
      
      Move setting of shm_flags explicitly to the two places shmfd are created, as
      we do with seals, and validate when we're opening a pre-existing mapping
      that we've either passed no flags or we've passed the exact same flags as
      the first time.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D26242
      5dd47b52
  11. 05 Aug, 2020 1 commit
  12. 10 Jul, 2020 1 commit
  13. 25 Jun, 2020 1 commit
    • Mark Johnston's avatar
      Call swap_pager_freespace() from vm_object_page_remove(). · 84242cf6
      Mark Johnston authored
      All vm_object_page_remove() callers, except
      linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
      removing a range of pages from an object.  The LinuxKPI case appears to
      be an unintentional omission that could result in leaked swap blocks, so
      unconditionally free swap space in vm_object_page_remove() to protect
      against similar bugs in the future.
      
      Reviewed by:	alc, kib
      Tested by:	pho
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D25329
      84242cf6
  14. 14 Apr, 2020 1 commit
    • Kyle Evans's avatar
      posixshm: fix counting of writable mappings · 51a16c84
      Kyle Evans authored
      Similar to mmap'ing vnodes, posixshm should count any mapping where maxprot
      contains VM_PROT_WRITE (i.e. fd opened r/w with no write-seal applied) as
      writable and thus blocking of any write-seal.
      
      The memfd tests have been amended to reflect the fixes here, which notably
      includes:
      
      1. Fix for error return bug; EPERM is not a documented failure mode for mmap
      2. Fix rejection of write-seal with active mappings that can be upgraded via
          mprotect(2).
      
      Reported by:	markj
      Discussed with:	markj, kib
      51a16c84
  15. 13 Apr, 2020 1 commit
    • Mark Johnston's avatar
      Relax restrictions on private mappings of POSIX shm objects. · c7841c6b
      Mark Johnston authored
      When creating a private mapping of a POSIX shared memory object,
      VM_PROT_WRITE should always be included in maxprot regardless of
      permissions on the underlying FD.  Otherwise it is possible to open a
      shm object read-only, map it with MAP_PRIVATE and PROT_WRITE, and
      violate the invariant in vm_map_insert() that (prot & maxprot) == prot.
      
      Reported by:	syzkaller
      Reviewed by:	kevans, kib
      MFC after:	1 week
      Sponsored by:	The FreeBSD Foundation
      Differential Revision:	https://reviews.freebsd.org/D24398
      c7841c6b
  16. 03 Mar, 2020 1 commit
  17. 28 Feb, 2020 1 commit
  18. 19 Jan, 2020 1 commit
  19. 09 Jan, 2020 1 commit
    • Kyle Evans's avatar
      shmfd: posix_fallocate(2): only take rangelock for section we need · 39eae263
      Kyle Evans authored
      Other mechanisms that resize the shmfd grab a write lock from 0 to OFF_MAX
      for safety, so we still get proper synchronization of shmfd->shm_size in
      effect. There's no need to block readers/writers of earlier segments when
      we're just reserving more space, so narrow the scope -- it would likely be
      safe to narrow it completely to just the section of the range that extends
      beyond our current size, but this likely isn't worth it since the size isn't
      stable until the writelock is granted the first time.
      
      Suggested by:	cem (passing comment)
      39eae263
  20. 08 Jan, 2020 1 commit
  21. 05 Jan, 2020 2 commits
    • Kyle Evans's avatar
      shm: correct KPI mistake introduced around memfd_create · 535b1df9
      Kyle Evans authored
      When file sealing and shm_open2 were introduced, we should have grown a new
      kern_shm_open2 helper that did the brunt of the work with the new interface
      while kern_shm_open remains the same. Instead, more complexity was
      introduced to kern_shm_open to handle the additional features and consumers
      had to keep changing in somewhat awkward ways, and a kern_shm_open2 was
      added to wrap kern_shm_open.
      
      Backpedal on this and correct the situation- kern_shm_open returns to the
      interface it had prior to file sealing being introduced, and neither
      function needs an initial_seals argument anymore as it's handled in
      kern_shm_open2 based on the shmflags.
      535b1df9
    • Kyle Evans's avatar
      shmfd/mmap: restrict maxprot with MAP_SHARED + F_SEAL_WRITE · 58366f05
      Kyle Evans authored
      If a write seal is set on a shared mapping, we must exclude VM_PROT_WRITE as
      the fd is effectively read-only. This was discovered by running
      devel/linux-ltp, which mmap's with acceptable protections specified then
      attempts to raise to PROT_READ|PROT_WRITE with mprotect(2), which we
      allowed.
      
      Reviewed by:	kib
      Differential Revision:	https://reviews.freebsd.org/D22978
      58366f05
  22. 28 Dec, 2019 1 commit
    • Mark Johnston's avatar
      Remove page locking for queue operations. · 9f5632e6
      Mark Johnston authored
      With the previous reviews, the page lock is no longer required in order
      to perform queue operations on a page.  It is also no longer needed in
      the page queue scans.  This change effectively eliminates remaining uses
      of the page lock and also the false sharing caused by multiple pages
      sharing a page lock.
      
      Reviewed by:	jeff
      Tested by:	pho
      Sponsored by:	Netflix, Intel
      Differential Revision:	https://reviews.freebsd.org/D22885
      9f5632e6
  23. 15 Dec, 2019 2 commits
  24. 19 Nov, 2019 1 commit
  25. 18 Nov, 2019 1 commit
    • David Bright's avatar
      Jail and capability mode for shm_rename; add audit support for shm_rename · 2d5603fe
      David Bright authored
      Co-mingling two things here:
      
        * Addressing some feedback from Konstantin and Kyle re: jail,
          capability mode, and a few other things
        * Adding audit support as promised.
      
      The audit support change includes a partial refresh of OpenBSM from
      upstream, where the change to add shm_rename has already been
      accepted. Matthew doesn't plan to work on refreshing anything else to
      support audit for those new event types.
      
      Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
      Reviewed by:	kib
      Relnotes:	Yes
      Sponsored by:	Dell EMC Isilon
      Differential Revision:	https://reviews.freebsd.org/D22083
      2d5603fe
  26. 15 Oct, 2019 2 commits
  27. 02 Oct, 2019 1 commit
    • Kyle Evans's avatar
      shm_open2(2): completely unbreak · 5a391b57
      Kyle Evans authored
      kern_shm_open2(), since conception, completely fails to pass the mode along
      to kern_shm_open(). This breaks most uses of it.
      
      Add tests alongside this that actually check the mode of the returned
      files.
      
      PR:		240934 [pulseaudio breakage]
      Reported by:	ler, Andrew Gierth [postgres breakage]
      Diagnosed by:	Andrew Gierth (great catch)
      Tested by:	ler, tmunro
      Pointy hat to:	kevans
      5a391b57
  28. 26 Sep, 2019 1 commit
    • David Bright's avatar
      Add an shm_rename syscall · 9afb12ba
      David Bright authored
      Add an atomic shm rename operation, similar in spirit to a file
      rename. Atomically unlink an shm from a source path and link it to a
      destination path. If an existing shm is linked at the destination
      path, unlink it as part of the same atomic operation. The caller needs
      the same permissions as shm_unlink to the shm being renamed, and the
      same permissions for the shm at the destination which is being
      unlinked, if it exists. If those fail, EACCES is returned, as with the
      other shm_* syscalls.
      
      truss support is included; audit support will come later.
      
      This commit includes only the implementation; the sysent-generated
      bits will come in a follow-on commit.
      
      Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
      Reviewed by:	jilles (earlier revision)
      Reviewed by:	brueffer (manpages, earlier revision)
      Relnotes:	yes
      Sponsored by:	Dell EMC Isilon
      Differential Revision:	https://reviews.freebsd.org/D21423
      9afb12ba
  29. 25 Sep, 2019 4 commits
    • Kyle Evans's avatar
      sysent: regenerate after r352705 · a9ac5e14
      Kyle Evans authored
      This also implements it, fixes kdump, and removes no longer needed bits from
      lib/libc/sys/shm_open.c for the interim.
      a9ac5e14
    • Kyle Evans's avatar
      Add a shm_open2 syscall to support upcoming memfd_create · 20f70576
      Kyle Evans authored
      shm_open2 allows a little more flexibility than the original shm_open.
      shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
      shmflag argument that can be expanded later. Currently the only shmflag is
      to allow file sealing on the returned fd.
      
      shm_open and memfd_create will both be implemented in libc to use this new
      syscall.
      
      __FreeBSD_version is bumped to indicate the presence.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21393
      20f70576
    • Kyle Evans's avatar
      [2/3] Add an initial seal argument to kern_shm_open() · 0cd95859
      Kyle Evans authored
      Now that flags may be set on posixshm, add an argument to kern_shm_open()
      for the initial seals. To maintain past behavior where callers of
      shm_open(2) are guaranteed to not have any seals applied to the fd they're
      given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special
      flag could be opened later for shm_open(2) to indicate that sealing should
      be allowed.
      
      We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if
      F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a
      shmfd that already existed. A note's been added about the assumptions we've
      made here as a hint towards anyone wanting to allow other seals to be
      applied at creation.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21392
      0cd95859
    • Kyle Evans's avatar
      [1/3] Add mostly Linux-compatible file sealing support · af755d3e
      Kyle Evans authored
      File sealing applies protections against certain actions
      (currently: write, growth, shrink) at the inode level. New fileops are added
      to accommodate seals - EINVAL is returned by fcntl(2) if they are not
      implemented.
      
      Reviewed by:	markj, kib
      Differential Revision:	https://reviews.freebsd.org/D21391
      af755d3e
  30. 10 Sep, 2019 1 commit
  31. 09 Sep, 2019 1 commit
    • Mark Johnston's avatar
      Change synchonization rules for vm_page reference counting. · fee2a2fa
      Mark Johnston authored
      There are several mechanisms by which a vm_page reference is held,
      preventing the page from being freed back to the page allocator.  In
      particular, holding the page's object lock is sufficient to prevent the
      page from being freed; holding the busy lock or a wiring is sufficent as
      well.  These references are protected by the page lock, which must
      therefore be acquired for many per-page operations.  This results in
      false sharing since the page locks are external to the vm_page
      structures themselves and each lock protects multiple structures.
      
      Transition to using an atomically updated per-page reference counter.
      The object's reference is counted using a flag bit in the counter.  A
      second flag bit is used to atomically block new references via
      pmap_extract_and_hold() while removing managed mappings of a page.
      Thus, the reference count of a page is guaranteed not to increase if the
      page is unbusied, unmapped, and the object's write lock is held.  As
      a consequence of this, the page lock no longer protects a page's
      identity; operations which move pages between objects are now
      synchronized solely by the objects' locks.
      
      The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
      requires that either the object lock or the busy lock is held.  The
      latter no longer has a return value and may free the page if it releases
      the last reference to that page.  vm_page_unwire_noq() behaves the same
      as before; the caller is responsible for checking its return value and
      freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
      introduced for use in pmap_extract_and_hold().  It fails if the page is
      concurrently being unmapped, typically triggering a fallback to the
      fault handler.  vm_page_wire() no longer requires the page lock and
      vm_page_unwire() now internally acquires the page lock when releasing
      the last wiring of a page (since the page lock still protects a page's
      queue state).  In particular, synchronization details are no longer
      leaked into the caller.
      
      The change excises the page lock from several frequently executed code
      paths.  In particular, vm_object_terminate() no longer bounces between
      page locks as it releases an object's pages, and direct I/O and
      sendfile(SF_NOCACHE) completions no longer require the page lock.  In
      these latter cases we now get linear scalability in the common scenario
      where different threads are operating on different files.
      
      __FreeBSD_version is bumped.  The DRM ports have been updated to
      accomodate the KPI changes.
      
      Reviewed by:	jeff (earlier version)
      Tested by:	gallatin (earlier version), pho
      Sponsored by:	Netflix
      Differential Revision:	https://reviews.freebsd.org/D20486
      fee2a2fa
  32. 03 Sep, 2019 1 commit
    • Kyle Evans's avatar
      posixshm: start counting writeable mappings · dca52ab4
      Kyle Evans authored
      r351650 switched posixshm to using OBJT_SWAP for shm_object
      
      r351795 added support to the swap_pager for tracking writeable mappings
      
      Take advantage of this and start tracking writeable mappings; fd sealing
      will use this to reject a seal on writing with EBUSY if any such mapping
      exist.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21456
      dca52ab4
  33. 01 Sep, 2019 1 commit
    • Kyle Evans's avatar
      posixshm: switch to OBJT_SWAP in advance of other changes · 32287ea7
      Kyle Evans authored
      Future changes to posixshm will start tracking writeable mappings in order
      to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT
      object is complicated as it may be swapped out and converted to an
      OBJT_SWAP. One may generically add this tracking for vm_object, but this is
      difficult to do without increasing memory footprint of vm_object and blowing
      up memory usage by a significant amount.
      
      On the other hand, the swap pager can be expanded to track writeable
      mappings without increasing vm_object size. This change is currently in
      D21456. Switch over to OBJT_SWAP in advance of the other changes to the
      swap pager and posixshm.
      32287ea7