1. 15 Dec, 2019 2 commits
  2. 19 Nov, 2019 1 commit
  3. 18 Nov, 2019 1 commit
    • David Bright's avatar
      Jail and capability mode for shm_rename; add audit support for shm_rename · 2d5603fe
      David Bright authored
      Co-mingling two things here:
      
        * Addressing some feedback from Konstantin and Kyle re: jail,
          capability mode, and a few other things
        * Adding audit support as promised.
      
      The audit support change includes a partial refresh of OpenBSM from
      upstream, where the change to add shm_rename has already been
      accepted. Matthew doesn't plan to work on refreshing anything else to
      support audit for those new event types.
      
      Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
      Reviewed by:	kib
      Relnotes:	Yes
      Sponsored by:	Dell EMC Isilon
      Differential Revision:	https://reviews.freebsd.org/D22083
      2d5603fe
  4. 15 Oct, 2019 2 commits
  5. 02 Oct, 2019 1 commit
    • Kyle Evans's avatar
      shm_open2(2): completely unbreak · 5a391b57
      Kyle Evans authored
      kern_shm_open2(), since conception, completely fails to pass the mode along
      to kern_shm_open(). This breaks most uses of it.
      
      Add tests alongside this that actually check the mode of the returned
      files.
      
      PR:		240934 [pulseaudio breakage]
      Reported by:	ler, Andrew Gierth [postgres breakage]
      Diagnosed by:	Andrew Gierth (great catch)
      Tested by:	ler, tmunro
      Pointy hat to:	kevans
      5a391b57
  6. 26 Sep, 2019 1 commit
    • David Bright's avatar
      Add an shm_rename syscall · 9afb12ba
      David Bright authored
      Add an atomic shm rename operation, similar in spirit to a file
      rename. Atomically unlink an shm from a source path and link it to a
      destination path. If an existing shm is linked at the destination
      path, unlink it as part of the same atomic operation. The caller needs
      the same permissions as shm_unlink to the shm being renamed, and the
      same permissions for the shm at the destination which is being
      unlinked, if it exists. If those fail, EACCES is returned, as with the
      other shm_* syscalls.
      
      truss support is included; audit support will come later.
      
      This commit includes only the implementation; the sysent-generated
      bits will come in a follow-on commit.
      
      Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
      Reviewed by:	jilles (earlier revision)
      Reviewed by:	brueffer (manpages, earlier revision)
      Relnotes:	yes
      Sponsored by:	Dell EMC Isilon
      Differential Revision:	https://reviews.freebsd.org/D21423
      9afb12ba
  7. 25 Sep, 2019 4 commits
    • Kyle Evans's avatar
      sysent: regenerate after r352705 · a9ac5e14
      Kyle Evans authored
      This also implements it, fixes kdump, and removes no longer needed bits from
      lib/libc/sys/shm_open.c for the interim.
      a9ac5e14
    • Kyle Evans's avatar
      Add a shm_open2 syscall to support upcoming memfd_create · 20f70576
      Kyle Evans authored
      shm_open2 allows a little more flexibility than the original shm_open.
      shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
      shmflag argument that can be expanded later. Currently the only shmflag is
      to allow file sealing on the returned fd.
      
      shm_open and memfd_create will both be implemented in libc to use this new
      syscall.
      
      __FreeBSD_version is bumped to indicate the presence.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21393
      20f70576
    • Kyle Evans's avatar
      [2/3] Add an initial seal argument to kern_shm_open() · 0cd95859
      Kyle Evans authored
      Now that flags may be set on posixshm, add an argument to kern_shm_open()
      for the initial seals. To maintain past behavior where callers of
      shm_open(2) are guaranteed to not have any seals applied to the fd they're
      given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special
      flag could be opened later for shm_open(2) to indicate that sealing should
      be allowed.
      
      We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if
      F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a
      shmfd that already existed. A note's been added about the assumptions we've
      made here as a hint towards anyone wanting to allow other seals to be
      applied at creation.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21392
      0cd95859
    • Kyle Evans's avatar
      [1/3] Add mostly Linux-compatible file sealing support · af755d3e
      Kyle Evans authored
      File sealing applies protections against certain actions
      (currently: write, growth, shrink) at the inode level. New fileops are added
      to accommodate seals - EINVAL is returned by fcntl(2) if they are not
      implemented.
      
      Reviewed by:	markj, kib
      Differential Revision:	https://reviews.freebsd.org/D21391
      af755d3e
  8. 10 Sep, 2019 1 commit
  9. 09 Sep, 2019 1 commit
    • Mark Johnston's avatar
      Change synchonization rules for vm_page reference counting. · fee2a2fa
      Mark Johnston authored
      There are several mechanisms by which a vm_page reference is held,
      preventing the page from being freed back to the page allocator.  In
      particular, holding the page's object lock is sufficient to prevent the
      page from being freed; holding the busy lock or a wiring is sufficent as
      well.  These references are protected by the page lock, which must
      therefore be acquired for many per-page operations.  This results in
      false sharing since the page locks are external to the vm_page
      structures themselves and each lock protects multiple structures.
      
      Transition to using an atomically updated per-page reference counter.
      The object's reference is counted using a flag bit in the counter.  A
      second flag bit is used to atomically block new references via
      pmap_extract_and_hold() while removing managed mappings of a page.
      Thus, the reference count of a page is guaranteed not to increase if the
      page is unbusied, unmapped, and the object's write lock is held.  As
      a consequence of this, the page lock no longer protects a page's
      identity; operations which move pages between objects are now
      synchronized solely by the objects' locks.
      
      The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
      requires that either the object lock or the busy lock is held.  The
      latter no longer has a return value and may free the page if it releases
      the last reference to that page.  vm_page_unwire_noq() behaves the same
      as before; the caller is responsible for checking its return value and
      freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
      introduced for use in pmap_extract_and_hold().  It fails if the page is
      concurrently being unmapped, typically triggering a fallback to the
      fault handler.  vm_page_wire() no longer requires the page lock and
      vm_page_unwire() now internally acquires the page lock when releasing
      the last wiring of a page (since the page lock still protects a page's
      queue state).  In particular, synchronization details are no longer
      leaked into the caller.
      
      The change excises the page lock from several frequently executed code
      paths.  In particular, vm_object_terminate() no longer bounces between
      page locks as it releases an object's pages, and direct I/O and
      sendfile(SF_NOCACHE) completions no longer require the page lock.  In
      these latter cases we now get linear scalability in the common scenario
      where different threads are operating on different files.
      
      __FreeBSD_version is bumped.  The DRM ports have been updated to
      accomodate the KPI changes.
      
      Reviewed by:	jeff (earlier version)
      Tested by:	gallatin (earlier version), pho
      Sponsored by:	Netflix
      Differential Revision:	https://reviews.freebsd.org/D20486
      fee2a2fa
  10. 03 Sep, 2019 1 commit
    • Kyle Evans's avatar
      posixshm: start counting writeable mappings · dca52ab4
      Kyle Evans authored
      r351650 switched posixshm to using OBJT_SWAP for shm_object
      
      r351795 added support to the swap_pager for tracking writeable mappings
      
      Take advantage of this and start tracking writeable mappings; fd sealing
      will use this to reject a seal on writing with EBUSY if any such mapping
      exist.
      
      Reviewed by:	kib, markj
      Differential Revision:	https://reviews.freebsd.org/D21456
      dca52ab4
  11. 01 Sep, 2019 1 commit
    • Kyle Evans's avatar
      posixshm: switch to OBJT_SWAP in advance of other changes · 32287ea7
      Kyle Evans authored
      Future changes to posixshm will start tracking writeable mappings in order
      to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT
      object is complicated as it may be swapped out and converted to an
      OBJT_SWAP. One may generically add this tracking for vm_object, but this is
      difficult to do without increasing memory footprint of vm_object and blowing
      up memory usage by a significant amount.
      
      On the other hand, the swap pager can be expanded to track writeable
      mappings without increasing vm_object size. This change is currently in
      D21456. Switch over to OBJT_SWAP in advance of the other changes to the
      swap pager and posixshm.
      32287ea7
  12. 28 Aug, 2019 1 commit
    • Mark Johnston's avatar
      Wire pages in vm_page_grab() when appropriate. · b5d239cb
      Mark Johnston authored
      uiomove_object_page() and exec_map_first_page() would previously wire a
      page after having grabbed it.  Ask vm_page_grab() to perform the wiring
      instead: this removes some redundant code, and is cheaper in the case
      where the requested page is not resident since the page allocator can be
      asked to initialize the page as wired, whereas a separate vm_page_wire()
      call requires the page lock.
      
      In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of
      vm_page_unwire(PQ_NONE).  The latter ensures that the page is dequeued
      before returning, but this is unnecessary since vm_page_free() will
      trigger a batched dequeue of the page.
      
      Reviewed by:	alc, kib
      Tested by:	pho (part of a larger patch)
      MFC after:	1 week
      Sponsored by:	Netflix
      Differential Revision:	https://reviews.freebsd.org/D21440
      b5d239cb
  13. 31 Jul, 2019 1 commit
    • Kyle Evans's avatar
      kern_shm_open: push O_CLOEXEC into caller control · b5a7ac99
      Kyle Evans authored
      The motivation for this change is to allow wrappers around shm to be written
      that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets
      it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which
      is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1().
      Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from
      the context.
      
      sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX
      requirements, and a comment has been dropped in to kern_fd_open() to explain
      the situation and add a pointer to where O_CLOEXEC setting is maintained for
      shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally
      sets O_CLOEXEC to match previous behavior.
      
      This also has the side-effect of making flags correctly reflect the
      O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a
      glance-over leads me to believe that it didn't really matter.
      
      Reviewed by:	kib, markj
      MFC after:	1 week
      Differential Revision:	https://reviews.freebsd.org/D21119
      b5a7ac99
  14. 29 Jul, 2019 1 commit
  15. 08 Jul, 2019 1 commit
    • Mark Johnston's avatar
      Merge the vm_page hold and wire mechanisms. · eeacb3b0
      Mark Johnston authored
      The hold_count and wire_count fields of struct vm_page are separate
      reference counters with similar semantics.  The remaining essential
      differences are that holds are not counted as a reference with respect
      to LRU, and holds have an implicit free-on-last unhold semantic whereas
      vm_page_unwire() callers must explicitly determine whether to free the
      page once the last reference to the page is released.
      
      This change removes the KPIs which directly manipulate hold_count.
      Functions such as vm_fault_quick_hold_pages() now return wired pages
      instead.  Since r328977 the overhead of maintaining LRU for wired pages
      is lower, and in many cases vm_fault_quick_hold_pages() callers would
      swap holds for wirings on the returned pages anyway, so with this change
      we remove a number of page lock acquisitions.
      
      No functional change is intended.  __FreeBSD_version is bumped.
      
      Reviewed by:	alc, kib
      Discussed with:	jeff
      Discussed with:	jhb, np (cxgbe)
      Tested by:	pho (previous version)
      Sponsored by:	Netflix
      Differential Revision:	https://reviews.freebsd.org/D19247
      eeacb3b0
  16. 30 May, 2019 1 commit
  17. 23 May, 2019 2 commits
  18. 28 Feb, 2019 1 commit
  19. 11 Dec, 2018 1 commit
  20. 21 Nov, 2018 1 commit
  21. 27 Nov, 2017 1 commit
    • Pedro F. Giffuni's avatar
      sys/kern: adoption of SPDX licensing ID tags. · 8a36da99
      Pedro F. Giffuni authored
      Mainly focus on files that use BSD 2-Clause license, however the tool I
      was using misidentified many licenses so this was mostly a manual - error
      prone - task.
      
      The Software Package Data Exchange (SPDX) group provides a specification
      to make it easier for automated tools to detect and summarize well known
      opensource licenses. We are gradually adopting the specification, noting
      that the tags are considered only advisory and do not, in any way,
      superceed or replace the license texts.
      8a36da99
  22. 08 Nov, 2017 1 commit
    • Jeff Roberson's avatar
      Replace manyinstances of VM_WAIT with blocking page allocation flags · 8d6fbbb8
      Jeff Roberson authored
      similar to the kernel memory allocator.
      
      This simplifies NUMA allocation because the domain will be known at wait
      time and races between failure and sleeping are eliminated.  This also
      reduces boilerplate code and simplifies callers.
      
      A wait primitive is supplied for uma zones for similar reasons.  This
      eliminates some non-specific VM_WAIT calls in favor of more explicit
      sleeps that may be satisfied without new pages.
      
      Reviewed by:	alc, kib, markj
      Tested by:	pho
      Sponsored by:	Netflix, Dell/EMC Isilon
      8d6fbbb8
  23. 02 Oct, 2017 1 commit
  24. 30 Sep, 2017 1 commit
    • Mark Johnston's avatar
      Have uiomove_object_page() keep accessed pages in the active queue. · 0ffc7ed7
      Mark Johnston authored
      Previously, uiomove_object_page() would maintain LRU by requeuing the
      accessed page. This involves acquiring one of the heavily contended page
      queue locks. Moreover, it is unnecessarily expensive for pages in the
      active queue.
      
      As of r254304 the page daemon continually performs a slow scan of the
      active queue, with the effect that unreferenced pages are gradually
      moved to the inactive queue, from which they can be reclaimed. Prior to
      that revision, the active queue was scanned only during shortages of
      free and inactive pages, meaning that unreferenced pages could get
      "stuck" in the queue. Thus, tmpfs was required to use the inactive queue
      and requeue pages in order to maintain LRU. Now that this is no longer
      the case, tmpfs I/O operations can use the active queue and avoid the
      page queue locks in most cases, instead setting PGA_REFERENCED on
      referenced pages to provide pseudo-LRU.
      
      Reviewed by:	alc (previous version)
      MFC after:	2 weeks
      0ffc7ed7
  25. 27 Jun, 2017 1 commit
  26. 31 Mar, 2017 1 commit
    • Robert Watson's avatar
      Audit arguments to POSIX message queues, semaphores, and shared memory. · 15bcf785
      Robert Watson authored
      This requires minor changes to the audit framework to allow capturing
      paths that are not filesystem paths (i.e., will not be canonicalised
      relative to the process current working directory and/or filesystem
      root).
      
      Obtained from:	TrustedBSD Project
      MFC after:	3 weeks
      Sponsored by:	DARPA, AFRL
      15bcf785
  27. 20 Mar, 2017 1 commit
  28. 12 Feb, 2017 1 commit
    • Konstantin Belousov's avatar
      Consistently handle negative or wrapping offsets in the mmap(2) syscalls. · 987ff181
      Konstantin Belousov authored
      For regular files and posix shared memory, POSIX requires that
      [offset, offset + size) range is legitimate.  At the maping time,
      check that offset is not negative.  Allowing negative offsets might
      expose the data that filesystem put into vm_object for internal use,
      esp. due to OFF_TO_IDX() signess treatment.  Fault handler verifies
      that the mapped range is valid, assuming that mmap(2) checked that
      arithmetic gives no undefined results.
      
      For device mappings, leave the semantic of negative offsets to the
      driver.  Correct object page index calculation to not erronously
      propagate sign.
      
      In either case, disallow overflow of offset + size.
      
      Update mmap(2) man page to explain the requirement of the range
      validity, and behaviour when the range becomes invalid after mapping.
      
      Reported and tested by:	royger (previous version)
      Reviewed by:	alc
      Sponsored by:	The FreeBSD Foundation
      MFC after:	2 weeks
      987ff181
  29. 11 Dec, 2016 1 commit
  30. 15 Nov, 2016 1 commit
  31. 14 Aug, 2016 1 commit
  32. 23 Jun, 2016 1 commit
  33. 26 Apr, 2016 1 commit
  34. 14 Apr, 2016 1 commit