1. 19 Oct, 2010 24 commits
    • Jamie Gritton's avatar
    • Justin T. Gibbs's avatar
      Improve the Xen para-virtualized device infrastructure of FreeBSD: · ff662b5c
      Justin T. Gibbs authored
       o Add support for backend devices (e.g. blkback)
       o Implement extensions to the Xen para-virtualized block API to allow
         for larger and more outstanding I/Os.
       o Import a completely rewritten block back driver with support for fronting
         I/O to both raw devices and files.
       o General cleanup and documentation of the XenBus and XenStore support code.
       o Robustness and performance updates for the block front driver.
       o Fixes to the netfront driver.
      Sponsored by: Spectra Logic Corporation
      	Deleted: This file explains the Linux method for XenBus device
      	enumeration and thus does not apply to FreeBSD's NewBus approach.
      	Deleted: Linux version of backend XenBus service routines.  It
      	was never ported to FreeBSD.  See xenbusb.c, xenbusb_if.m,
      	xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus
      	Split XenStore into its own tree.  XenBus is a software layer built
      	on top of XenStore.  The old arrangement and the naming of some
      	structures and functions blurred these lines making it difficult to
      	discern what services are provided by which layer and at what times
      	these services are available (e.g. during system startup and shutdown).
      	Split up XenBus code into methods available for use by client
      	drivers (xenbus.c) and code used by the XenBus "bus code" to
      	enumerate, attach, detach, and service bus drivers.
      	Add a XenBus front driver for handling shutdown, reboot, suspend, and
      	resume events published in the XenStore.  Move all PV suspend/reboot
      	support from reboot.c into this driver.
      	New file from Xen vendor with macros and structures used by
      	a block back driver to service requests from a VM running a
      	different ABI (e.g. amd64 back with i386 front).
      	Adjust kernel build spec for new XenBus/XenStore layout and added
      	Xen functionality.
      	o Rename XenStore APIs and structures from xenbus_* to xs_*.
      	o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation
      	  of objects returned by these APIs.
      	o Adjust for changes in the bus interface for Xen drivers.
      	Add Doxygen comments for these interfaces and the code that
      	implements them.
      	o Rewrite the Block Back driver to attach properly via newbus,
      	  operate correctly in both PV and HVM mode regardless of domain
      	  (e.g. can be in a DOM other than 0), and to deal with the latest
      	  metadata available in XenStore for block devices.
      	o Allow users to specify a file as a backend to blkback, in addition
      	  to character devices.  Use the namei lookup of the backend path
      	  to automatically configure, based on file type, the appropriate
      	  backend method.
      	The current implementation is limited to a single outstanding I/O
      	at a time to file backed storage.
      	Extend the Xen blkif API: Negotiable request size and number of
      	This change extends the information recorded in the XenStore
      	allowing block front/back devices to negotiate for optimal I/O
      	parameters.  This has been achieved without sacrificing backward
      	compatibility with drivers that are unaware of these protocol
      	enhancements.  The extensions center around the connection protocol
      	which now includes these additions:
      	o The back-end device publishes its maximum supported values for,
      	  request I/O size, the number of page segments that can be
      	  associated with a request, the maximum number of requests that
      	  can be concurrently active, and the maximum number of pages that
      	  can be in the shared request ring.  These values are published
      	  before the back-end enters the XenbusStateInitWait state.
      	o The front-end waits for the back-end to enter either the InitWait
      	  or Initialize state.  At this point, the front end limits it's
      	  own capabilities to the lesser of the values it finds published
      	  by the backend, it's own maximums, or, should any back-end data
      	  be missing in the store, the values supported by the original
      	  protocol.  It then initializes it's internal data structures
      	  including allocation of the shared ring, publishes its maximum
      	  capabilities to the XenStore and transitions to the Initialized
      	o The back-end waits for the front-end to enter the Initalized
      	  state.  At this point, the back end limits it's own capabilities
      	  to the lesser of the values it finds published by the frontend,
      	  it's own maximums, or, should any front-end data be missing in
      	  the store, the values supported by the original protocol.  It
      	  then initializes it's internal data structures, attaches to the
      	  shared ring and transitions to the Connected state.
      	o The front-end waits for the back-end to enter the Connnected
      	  state, transitions itself to the connected state, and can
      	  commence I/O.
      	Although an updated front-end driver must be aware of the back-end's
      	InitWait state, the back-end has been coded such that it can
      	tolerate a front-end that skips this step and transitions directly
      	to the Initialized state without waiting for the back-end.
      	o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255.  This is
      	  the maximum number possible without changing the blkif
      	  request header structure (nr_segs is a uint8_t).
      	o Add two new constants:
      	  BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK.  These respectively
      	  indicate the number of segments that can fit in the first
      	  ring-buffer entry of a request, and for each subsequent
      	  (sg element only) ring-buffer entry associated with the
                "header" ring-buffer entry of the request.
      	o Add the blkif_request_segment_t typedef for segment
      	o Add the BLKRING_GET_SG_REQUEST() macro which wraps the
      	  RING_GET_REQUEST() macro and returns a properly cast
      	  pointer to an array of blkif_request_segment_ts.
      	o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the
      	  number of ring entries that will be consumed by a blkif
      	  request with the given number of segments.
      	o Update for changes in interface/io/blkif.h macros.
      	o Update the BLKIF_MAX_RING_REQUESTS() macro to take the
      	  ring size as an argument to allow this calculation on
      	  multi-page rings.
      	o Add a companion macro to BLKIF_MAX_RING_REQUESTS(),
      	  BLKIF_RING_PAGES().  This macro determines the number of
      	  ring pages required in order to support a ring with the
      	  supplied number of request blocks.
      	o Negotiate with the other-end with the following limits:
      	      Reqeust Size:   MAXPHYS
      	      Max Segments:   (MAXPHYS/PAGE_SIZE) + 1
      	      Max Requests:   256
      	      Max Ring Pages: Sufficient to support Max Requests with
      	                      Max Segments.
      	o Dynamically allocate request pools and segemnts-per-request.
      	o Update ring allocation/attachment code to support a
      	  multi-page shared ring.
      	o Update routines that access the shared ring to handle
      	  multi-block requests.
      	o Track blkfront allocations in a blkfront driver specific
      	  malloc pool.
      	o Strip out XenStore transaction retry logic in the
      	  connection code.  Transactions only need to be used when
      	  the update to multiple XenStore nodes must be atomic.
      	  That is not the case here.
      	o Fully disable blkif_resume() until it can be fixed
      	  properly (it didn't work before this change).
      	o Destroy bus-dma objects during device instance tear-down.
      	o Properly handle backend devices with powef-of-2 sector
      	  sizes larger than 512b.
      	Advertise support for and implement the BLKIF_OP_WRITE_BARRIER
      	and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and
      	the BIO_ORDERED attribute of bios.
      	Fix various bugs in blkfront.
             o gnttab_alloc_grant_references() returns 0 for success and
      	 non-zero for failure.  The check for < 0 is a leftover
             o When we negotiate with blkback and have to reduce some of our
      	 capabilities, print out the original and reduced capability before
      	 changing the local capability.  So the user now gets the correct
      	o Fix blkif_restart_queue_callback() formatting.  Make sure we hold
      	  the mutex in that function before calling xb_startio().
      	o Fix a couple of KASSERT()s.
              o Fix a check in the xb_remove_* macro to be a little more specific.
      	Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID.
      	Use GRANT_REF_INVALID instead of driver private definitions of the
      	same constant.
      	Add the gnttab_end_foreign_access_references() API.
      	This API allows a client to batch the release of an array of grant
      	references, instead of coding a private for loop.  The implementation
      	takes advantage of this batching to reduce lock overhead to one
      	acquisition and release per-batch instead of per-freed grant reference.
      	While here, reduce the duration the gnttab_list_lock is held during
      	gnttab_free_grant_references() operations.  The search to find the
      	tail of the incoming free list does not rely on global state and so
      	can be performed without holding the lock.
      	o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode.
      	  This allows an HVM domain to serve back end devices to other domains.
      	  This API is already implemented for PV mode.
      	o Synchronize the API between HVM and PV.
      	o Scan the full region of CPUID space in which the Xen VMM interface
      	  may be implemented.  On systems using SuSE as a Dom0 where the
      	  Viridian API is also exported, the VMM interface is above the region
      	  we used to search.
      	o Pass through bus_alloc_resource() calls so that XenBus drivers
      	  attaching on an HVM system can allocate unused physical address
      	  space from the nexus.  The block back driver makes use of this
      	Use the correct type for accessing the statically mapped xenstore
      	Move hvm_get_parameter() to the correct global header file instead
      	of as a private method to the XenStore.
      	Sync with vendor.
      	Add macro for calculating the number of ring pages needed for an N
      	deep ring.
      	To avoid duplication within the macros, create and use the new
      	__RING_HEADER_SIZE() macro.  This macro calculates the size of the
      	ring book keeping struct (producer/consumer indexes, etc.) that
      	resides at the head of the ring.
      	Add the __RING_PAGES() macro which calculates the number of shared
      	ring pages required to support a ring with the given number of
      	These APIs are used to support the multi-page ring version of the
      	Xen block API.
      	Add Comments.
      	o Refactor the FreeBSD XenBus support code to allow for both front and
      	  backend device attachments.
      	o Make use of new config_intr_hook capabilities to allow front and back
      	  devices to be probed/attached in parallel.
      	o Fix bugs in probe/attach state machine that could cause the system to
      	  hang when confronted with a failure either in the local domain or in
      	  a remote domain to which one of our driver instances is attaching.
      	o Publish all required state to the XenStore on device detach and
      	  failure.  The majority of the missing functionality was for serving
      	  as a back end since the typical "hot-plug" scripts in Dom0 don't
      	  handle the case of cleaning up for a "service domain" that is not
      	o Add dynamic sysctl nodes exposing the generic ivars of
      	  XenBus devices.
      	o Add doxygen style comments to the majority of the code.
      	o Cleanup types, formatting, etc.
      	Common code used by both front and back XenBus busses.
      	Method definitions for a XenBus bus.
      	XenBus bus specialization for front and back devices.
      MFC after:	1 month
    • Jung-uk Kim's avatar
      Remove undocumented and stale debug.acpi.do_powerstate tunable. It was · 22066615
      Jung-uk Kim authored
      added with hw.pci.do_powerstate but the PCI version was splitted into two
      separate tunables later and now this is completely stale.  To make it worse,
      PCI devices enumerated in ACPI tree ignore this tunable as it is handled by
      a function in acpi_pci.c instead.
    • Rebecca Cran's avatar
      Stop disallowing device nodes to be passed to camcontrol(8) since libcam · f0129ea8
      Rebecca Cran authored
      already allows both device names and nodes to be specified.
      Reviewed by:	avg
    • Jung-uk Kim's avatar
      Remove PCI_SET_POWERSTATE method from acpi.c and eradicate all PCI-specific · a7a3177f
      Jung-uk Kim authored
      knowledges from the file.  All PCI devices enumerated in ACPI tree must use
      correct one from acpi_pci.c any way.  Reduce duplicate codes as we did for
      pci.c in r213905.  Do not return ESRCH from PCIB_POWER_FOR_SLEEP method.
      When the method is not found, just return zero without modifying the given
      default value as it is completely optional.  As a side effect, the return
      state must not be NULL.  Note there is actually no functional change by
      removing ESRCH because acpi_pcib_power_for_sleep() always returns zero.
      Adjust debugging messages and add new ones under bootverbose to help
      debugging device power state related issues.
      Reviewed by:	jhb, imp (earlier versions)
    • Marius Strobl's avatar
      - Wrap exchanging td_intr_frame and calling the event timer callback in · 10c2bb0a
      Marius Strobl authored
        a critical section as apparently required by both. I don't think either
        belongs in the event timer front-ends but the callback should handle
        this as necessary instead just like for example intr_event_handle()
        does but this is how the other architectures currently handle it, either
        explicitly or implicitly.
      - Further rename and reword references to hardclock as this front-end no
        longer has a notion of actually calling it.
    • Bernhard Schmidt's avatar
      There is no reason to call rt_ifmsg(), remove it. · 96a911f6
      Bernhard Schmidt authored
      Submitted by:	Paul B Mahol <onemda at gmail.com>
      MFC after:	1 week
    • Bernhard Schmidt's avatar
      Fix an undefined behaviour if the desired ratectl algo is not available. · 9a9a302f
      Bernhard Schmidt authored
      This can happen if the algos are built as modules but are not loaded. If
      the selected ratectl algo is not available, try to load it (The load
      module functions does nothing currently). Add a dummy ratectl algo which
      always selects the first available rate. Use that one if the desired algo
      is not available.
      MFC after:	1 week
    • Jung-uk Kim's avatar
    • Andrey V. Elsukov's avatar
      ZFS pool name is not a real device in devfs. Do not wait for · 366523d1
      Andrey V. Elsukov authored
      device appear when mounting root from ZFS.
      Reviewed by:	marcel
      Approved by:	mav (mentor)
    • Xin LI's avatar
      Clarify that lagg(4) sends/receives on active port, not the master port. · 145e5188
      Xin LI authored
      Note that this still seems to be a little bit confusing as the concept of
      "master" is different from what people would expect on a networking
    • Jung-uk Kim's avatar
      Remove PCI header type 0 restriction from power state changes. PCI config. · 6d018c85
      Jung-uk Kim authored
      registers for bridges are saved and restored since r200341.
      OK'ed by:	imp, jhb
    • Jung-uk Kim's avatar
      Do not apply do_power_resume for suspending case. When do_powerstate was · b56b7525
      Jung-uk Kim authored
      splitted into do_power_resume and do_power_nodriver, it became stale.
    • Jaakko Heinonen's avatar
      Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9) · bc2589f5
      Jaakko Heinonen authored
      and print a diagnostic if the call fails.
      This avoids a panic when a device with an invalid name is attempted to
      be registered. For example the label class gets device names from
      untrusted input.
      Reviewed by:	freebsd-geom
    • Matthew D Fleming's avatar
      uma_zfree(zone, NULL) should do nothing, to match free(9). · 20ed0cb0
      Matthew D Fleming authored
      Noticed by:	Ron Steinke <rsteinke at isilon dot com>
      MFC after:	3 days
    • Ulrich Spörlein's avatar
      mdoc: fix markup typo · 52e9e8dc
      Ulrich Spörlein authored
      MFC after:	1 week (together with r213983)
    • Ed Maste's avatar
      Simplify and significantly speed up the timezone listing backend script. · bb5c5f84
      Ed Maste authored
      Reviewed by:	imp
    • Ed Maste's avatar
      Minor cleanup, including sysctl -n instead of sed to remove the sysctl · 161a621b
      Ed Maste authored
      Reviewed by:	imp
    • Rui Paulo's avatar
      Revert r206418 · e09a0bdb
      Rui Paulo authored
    • Ulrich Spörlein's avatar
      mdoc: drop even more redundant .Pp calls · 7cc1fde0
      Ulrich Spörlein authored
      No change in rendered output, less mandoc lint warnings.
      Tool provided by:	Nobuyuki Koganemaru n-kogane at syd.odn.ne.jp
    • Rick Macklem's avatar
      Fix the type of the 3rd argument for nm_getinfo so that it works · 4d4f9a37
      Rick Macklem authored
      for architectures like sparc64.
      Suggested by:	kib
      MFC after:	2 weeks
    • Konstantin Belousov's avatar
      When readdirplus() is handled on the exported filesystem that does · bcc5a93f
      Konstantin Belousov authored
      not support VFS_VGET, like msdosfs, do not call VOP_LOOKUP() for
      dotdot on the root directory. Our filesystems expect that VFS handles
      dotdot lookups on root on its own.
      Reported and tested by:	kevlo
      MFC after:   2 weeks
    • Rick Macklem's avatar
      Modify the NFS clients and the NLM so that the NLM can be used · ca27c028
      Rick Macklem authored
      by both clients. Since the NLM uses various fields of the
      nfsmount structure, those fields were extracted and put in a
      separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
      This structure also has a function pointer for a function that
      extracts the required information from the mount point and nfs vnode
      for that particular client, for information stored differently by the
      Reviewed by:	jhb
      MFC after:	2 weeks
    • Xin LI's avatar
      MFV: nc(1) from OpenBSD 4.8. · 4f2bbc00
      Xin LI authored
      While I'm there, bump WARNS level to 2 as the vendor
      have the right printf format string now.
      MFC after:	1 month
      Obtained from:	OpenBSD
  2. 18 Oct, 2010 16 commits