Skip to content
  • Alexander Motin's avatar
    Split dmu_zfetch() speculation and execution parts · 891568c9
    Alexander Motin authored
    
    
    To make better predictions on parallel workloads dmu_zfetch() should
    be called as early as possible to reduce possible request reordering.
    In particular, it should be called before dmu_buf_hold_array_by_dnode()
    calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
    up multiple threads same time on completion, that can significantly
    reorder the requests, making the stream look like random.  But we
    should not issue prefetch requests before the on-demand ones, since
    they may get to the disks first despite the I/O scheduler, increasing
    on-demand request latency.
    
    This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
    and dmu_zfetch_run().  The first can be executed as early as needed.
    It only updates statistics and makes predictions without issuing any
    I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
    called later when all on-demand I/Os are already issued.  It even
    tracks the activity of other concurrent threads, issuing the prefetch
    only when _all_ on-demand requests are issued.
    
    For many years it was a big problem for storage servers, handling
    deeper request queues from their clients, having to either serialize
    consequential reads to make ZFS prefetcher usable, or execute the
    incoming requests as-is and get almost no prefetch from ZFS, relying
    only on deep enough prefetch by the clients.  Benefits of those ways
    varied, but neither was perfect.  With this patch deeper queue
    sequential read benchmarks with CrystalDiskMark from Windows via
    iSCSI to FreeBSD target show me much better throughput with almost
    100% prefetcher hit rate, comparing to almost zero before.
    
    While there, I also removed per-stream zs_lock as useless, completely
    covered by parent zf_lock.  Also I reused zs_blocks refcount to track
    zf_stream linkage of the stream, since I believe previous zs_fetch ==
    NULL check in dmu_zfetch_stream_done() was racy.
    
    Delete prefetch streams when they reach ends of files.  It saves up
    to 1KB of RAM per file, plus reduces searches through the stream list.
    
    Block data prefetch (speculation and indirect block prefetch is still
    done since they are cheaper) if all dbufs of the stream are already
    in DMU cache.  First cache miss immediately fires all the prefetch
    that would be done for the stream by that time.  It saves some CPU
    time if same files within DMU cache capacity are read over and over.
    
    Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: default avatarAdam Moss <c@yotes.com>
    Reviewed-by: default avatarMatthew Ahrens <mahrens@delphix.com>
    Signed-off-by: default avatarAlexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes #11652 
    891568c9