- 13 Apr, 2020 1 commit
-
-
Mark Johnston authored
When creating a private mapping of a POSIX shared memory object, VM_PROT_WRITE should always be included in maxprot regardless of permissions on the underlying FD. Otherwise it is possible to open a shm object read-only, map it with MAP_PRIVATE and PROT_WRITE, and violate the invariant in vm_map_insert() that (prot & maxprot) == prot. Reported by: syzkaller Reviewed by: kevans, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24398
-
- 03 Mar, 2020 1 commit
-
-
Mark Johnston authored
PR: 244563 Reported by: swills
-
- 28 Feb, 2020 1 commit
-
-
Jeff Roberson authored
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23865
-
- 19 Jan, 2020 1 commit
-
-
Jeff Roberson authored
The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033
-
- 09 Jan, 2020 1 commit
-
-
Kyle Evans authored
Other mechanisms that resize the shmfd grab a write lock from 0 to OFF_MAX for safety, so we still get proper synchronization of shmfd->shm_size in effect. There's no need to block readers/writers of earlier segments when we're just reserving more space, so narrow the scope -- it would likely be safe to narrow it completely to just the section of the range that extends beyond our current size, but this likely isn't worth it since the size isn't stable until the writelock is granted the first time. Suggested by: cem (passing comment)
-
- 08 Jan, 2020 1 commit
-
-
Kyle Evans authored
Linux expects to be able to use posix_fallocate(2) on a memfd. Other places would use this with shm_open(2) to act as a smarter ftruncate(2). Test has been added to go along with this. Reviewed by: kib (earlier version) Differential Revision: https://reviews.freebsd.org/D23042
-
- 05 Jan, 2020 2 commits
-
-
Kyle Evans authored
When file sealing and shm_open2 were introduced, we should have grown a new kern_shm_open2 helper that did the brunt of the work with the new interface while kern_shm_open remains the same. Instead, more complexity was introduced to kern_shm_open to handle the additional features and consumers had to keep changing in somewhat awkward ways, and a kern_shm_open2 was added to wrap kern_shm_open. Backpedal on this and correct the situation- kern_shm_open returns to the interface it had prior to file sealing being introduced, and neither function needs an initial_seals argument anymore as it's handled in kern_shm_open2 based on the shmflags.
-
Kyle Evans authored
If a write seal is set on a shared mapping, we must exclude VM_PROT_WRITE as the fd is effectively read-only. This was discovered by running devel/linux-ltp, which mmap's with acceptable protections specified then attempts to raise to PROT_READ|PROT_WRITE with mprotect(2), which we allowed. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22978
-
- 28 Dec, 2019 1 commit
-
-
Mark Johnston authored
With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885
-
- 15 Dec, 2019 2 commits
-
-
Jeff Roberson authored
on a pagequeue. Reported by: pho
-
Jeff Roberson authored
an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654
-
- 19 Nov, 2019 1 commit
-
-
Jeff Roberson authored
reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119
-
- 18 Nov, 2019 1 commit
-
-
David Bright authored
Co-mingling two things here: * Addressing some feedback from Konstantin and Kyle re: jail, capability mode, and a few other things * Adding audit support as promised. The audit support change includes a partial refresh of OpenBSM from upstream, where the change to add shm_rename has already been accepted. Matthew doesn't plan to work on refreshing anything else to support audit for those new event types. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: kib Relnotes: Yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22083
-
- 15 Oct, 2019 2 commits
-
-
Jeff Roberson authored
Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594
-
Jeff Roberson authored
This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548
-
- 02 Oct, 2019 1 commit
-
-
Kyle Evans authored
kern_shm_open2(), since conception, completely fails to pass the mode along to kern_shm_open(). This breaks most uses of it. Add tests alongside this that actually check the mode of the returned files. PR: 240934 [pulseaudio breakage] Reported by: ler, Andrew Gierth [postgres breakage] Diagnosed by: Andrew Gierth (great catch) Tested by: ler, tmunro Pointy hat to: kevans
-
- 26 Sep, 2019 1 commit
-
-
David Bright authored
Add an atomic shm rename operation, similar in spirit to a file rename. Atomically unlink an shm from a source path and link it to a destination path. If an existing shm is linked at the destination path, unlink it as part of the same atomic operation. The caller needs the same permissions as shm_unlink to the shm being renamed, and the same permissions for the shm at the destination which is being unlinked, if it exists. If those fail, EACCES is returned, as with the other shm_* syscalls. truss support is included; audit support will come later. This commit includes only the implementation; the sysent-generated bits will come in a follow-on commit. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jilles (earlier revision) Reviewed by: brueffer (manpages, earlier revision) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21423
-
- 25 Sep, 2019 4 commits
-
-
Kyle Evans authored
This also implements it, fixes kdump, and removes no longer needed bits from lib/libc/sys/shm_open.c for the interim.
-
Kyle Evans authored
shm_open2 allows a little more flexibility than the original shm_open. shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate shmflag argument that can be expanded later. Currently the only shmflag is to allow file sealing on the returned fd. shm_open and memfd_create will both be implemented in libc to use this new syscall. __FreeBSD_version is bumped to indicate the presence. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21393
-
Kyle Evans authored
Now that flags may be set on posixshm, add an argument to kern_shm_open() for the initial seals. To maintain past behavior where callers of shm_open(2) are guaranteed to not have any seals applied to the fd they're given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special flag could be opened later for shm_open(2) to indicate that sealing should be allowed. We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a shmfd that already existed. A note's been added about the assumptions we've made here as a hint towards anyone wanting to allow other seals to be applied at creation. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21392
-
Kyle Evans authored
File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21391
-
- 10 Sep, 2019 1 commit
-
-
Jeff Roberson authored
- VM_ALLOC_NOCREAT will grab without creating a page. - vm_page_grab_valid() will grab and page in if necessary. - vm_page_busy_acquire() automates some busy acquire loops. Discussed with: alc, kib, markj Tested by: pho (part of larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21546
-
- 09 Sep, 2019 1 commit
-
-
Mark Johnston authored
There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486
-
- 03 Sep, 2019 1 commit
-
-
Kyle Evans authored
r351650 switched posixshm to using OBJT_SWAP for shm_object r351795 added support to the swap_pager for tracking writeable mappings Take advantage of this and start tracking writeable mappings; fd sealing will use this to reject a seal on writing with EBUSY if any such mapping exist. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21456
-
- 01 Sep, 2019 1 commit
-
-
Kyle Evans authored
Future changes to posixshm will start tracking writeable mappings in order to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT object is complicated as it may be swapped out and converted to an OBJT_SWAP. One may generically add this tracking for vm_object, but this is difficult to do without increasing memory footprint of vm_object and blowing up memory usage by a significant amount. On the other hand, the swap pager can be expanded to track writeable mappings without increasing vm_object size. This change is currently in D21456. Switch over to OBJT_SWAP in advance of the other changes to the swap pager and posixshm.
-
- 28 Aug, 2019 1 commit
-
-
Mark Johnston authored
uiomove_object_page() and exec_map_first_page() would previously wire a page after having grabbed it. Ask vm_page_grab() to perform the wiring instead: this removes some redundant code, and is cheaper in the case where the requested page is not resident since the page allocator can be asked to initialize the page as wired, whereas a separate vm_page_wire() call requires the page lock. In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued before returning, but this is unnecessary since vm_page_free() will trigger a batched dequeue of the page. Reviewed by: alc, kib Tested by: pho (part of a larger patch) MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21440
-
- 31 Jul, 2019 1 commit
-
-
Kyle Evans authored
The motivation for this change is to allow wrappers around shm to be written that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1(). Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from the context. sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX requirements, and a comment has been dropped in to kern_fd_open() to explain the situation and add a pointer to where O_CLOEXEC setting is maintained for shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally sets O_CLOEXEC to match previous behavior. This also has the side-effect of making flags correctly reflect the O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a glance-over leads me to believe that it didn't really matter. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21119
-
- 29 Jul, 2019 1 commit
-
-
Mark Johnston authored
MFC after: 3 days Sponsored by: The FreeBSD Foundation
-
- 08 Jul, 2019 1 commit
-
-
Mark Johnston authored
The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247
-
- 30 May, 2019 1 commit
-
-
Konstantin Belousov authored
Sponsored by: The FreeBSD Foundation MFC after: 3 days
-
- 23 May, 2019 2 commits
-
-
Konstantin Belousov authored
The sysctl provides the listing on named linked posix shared memory segments existing in the system. Reuse shm_fill_kinfo() for filling individual struct kinfo_file. Remove unneeded lock around reading of shmfd->shm_mode. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258
-
Konstantin Belousov authored
Unless there are transient references to the object, the ref count is equal to the number of the shared memory segment mappings plus one. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258
-
- 28 Feb, 2019 1 commit
-
-
Mark Johnston authored
They have no effect, as with filesystem file descriptors. This improves compatibility with some existing userspace code. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19330
-
- 11 Dec, 2018 1 commit
-
-
Mateusz Guzik authored
Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
-
- 21 Nov, 2018 1 commit
-
-
Mateusz Guzik authored
Sponsored by: The FreeBSD Foundation
-
- 27 Nov, 2017 1 commit
-
-
Pedro F. Giffuni authored
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
-
- 08 Nov, 2017 1 commit
-
-
Jeff Roberson authored
similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon
-
- 02 Oct, 2017 1 commit
-
-
Alan Cox authored
field. Reviewed by: kib, markj MFC after: 2 weeks X-MFC with: r324146
-
- 30 Sep, 2017 1 commit
-
-
Mark Johnston authored
Previously, uiomove_object_page() would maintain LRU by requeuing the accessed page. This involves acquiring one of the heavily contended page queue locks. Moreover, it is unnecessarily expensive for pages in the active queue. As of r254304 the page daemon continually performs a slow scan of the active queue, with the effect that unreferenced pages are gradually moved to the inactive queue, from which they can be reclaimed. Prior to that revision, the active queue was scanned only during shortages of free and inactive pages, meaning that unreferenced pages could get "stuck" in the queue. Thus, tmpfs was required to use the inactive queue and requeue pages in order to maintain LRU. Now that this is no longer the case, tmpfs I/O operations can use the active queue and avoid the page queue locks in most cases, instead setting PGA_REFERENCED on referenced pages to provide pseudo-LRU. Reviewed by: alc (previous version) MFC after: 2 weeks
-
- 27 Jun, 2017 1 commit
-
-
Konstantin Belousov authored
Found and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
-