Skip to content
  • Kirk McKusick's avatar
    Avoid "consumer not attached in g_io_request" panic when disk lost · 4a365e86
    Kirk McKusick authored
    while using a UFS snapshot.
    
    The UFS filesystem supports snapshots. Each snapshot is a file whose
    contents are a frozen image of the disk partition on which the filesystem
    resides. Each time an existing block in the filesystem is modified,
    the filesystem checks whether that block was in use at the time that
    the snapshot was taken. If so, and if it has not already been copied,
    a new block is allocated from among the blocks that were not in use
    at the time that the snapshot was taken and placed in the snapshot file
    to replace the entry that has not yet been copied. The previous contents
    of the block are copied to the newly allocated snapshot file block,
    and the write to the original is then allowed to proceed.
    
    The block allocation is done using the usual UFS_BALLOC() routine
    which allocates the needed block in the snapshot and returns a
    buffer that is set up to write data into the newly allocated block.
    In usual filesystem operation, the contents for the new block is
    copied from user space into the buffer and the buffer is then written
    to the file using bwrite(), bawrite(), or bdwrite(). In the case of a
    snapshot the new block must be filled from the disk block that is about
    to be rewritten. The snapshot routine has a function readblock() that
    it uses to read the `about to be rewritten' disk block.
    
    /*
     * Read the specified block into the given buffer.
     */
    static int
    readblock(snapvp, bp, lbn)
    	struct vnode *snapvp;
    	struct buf *bp;
    	ufs2_daddr_t lbn;
    {
    	struct inode *ip;
    	struct bio *bip;
    	struct fs *fs;
    
    	ip = VTOI(snapvp);
    	fs = ITOFS(ip);
    
    	bip = g_alloc_bio();
    	bip->bio_cmd = BIO_READ;
    	bip->bio_offset = dbtob(fsbtodb(fs, blkstofrags(fs, lbn)));
    	bip->bio_data = bp->b_data;
    	bip->bio_length = bp->b_bcount;
    	bip->bio_done = NULL;
    
    	g_io_request(bip, ITODEVVP(ip)->v_bufobj.bo_private);
    	bp->b_error = biowait(bip, "snaprdb");
    	g_destroy_bio(bip);
    	return (bp->b_error);
    }
    
    When the underlying disk fails, its GEOM module is removed.
    Subsequent attempts to access it should return the ENXIO error.
    The functionality of checking for the lost disk and returning
    ENXIO is handled by the g_vfs_strategy() routine:
    
    void
    g_vfs_strategy(struct bufobj *bo, struct buf *bp)
    {
    	struct g_vfs_softc *sc;
    	struct g_consumer *cp;
    	struct bio *bip;
    
    	cp = bo->bo_private;
    	sc = cp->geom->softc;
    
    	/*
    	 * If the provider has orphaned us, just return ENXIO.
    	 */
    	mtx_lock(&sc->sc_mtx);
    	if (sc->sc_orphaned || sc->sc_enxio_active) {
    		mtx_unlock(&sc->sc_mtx);
    		bp->b_error = ENXIO;
    		bp->b_ioflags |= BIO_ERROR;
    		bufdone(bp);
    		return;
    	}
    	sc->sc_active++;
    	mtx_unlock(&sc->sc_mtx);
    
    	bip = g_alloc_bio();
    	bip->bio_cmd = bp->b_iocmd;
    	bip->bio_offset = bp->b_iooffset;
    	bip->bio_length = bp->b_bcount;
    	bdata2bio(bp, bip);
    	if ((bp->b_flags & B_BARRIER) != 0) {
    		bip->bio_flags |= BIO_ORDERED;
    		bp->b_flags &= ~B_BARRIER;
    	}
    	if (bp->b_iocmd == BIO_SPEEDUP)
    		bip->bio_flags |= bp->b_ioflags;
    	bip->bio_done = g_vfs_done;
    	bip->bio_caller2 = bp;
    	g_io_request(bip, cp);
    }
    
    Only after checking that the device is present does it construct
    the "bio" request and call g_io_request(). When readblock()
    constructs its own "bio" request and calls g_io_request() directly
    it panics with "consumer not attached in g_io_request" when the
    underlying device no longer exists.
    
    The fix is to have readblock() call g_vfs_strategy() rather than
    constructing its own "bio" request:
    
    /*
     * Read the specified block into the given buffer.
     */
    static int
    readblock(snapvp, bp, lbn)
    	struct vnode *snapvp;
    	struct buf *bp;
    	ufs2_daddr_t lbn;
    {
    	struct inode *ip;
    	struct fs *fs;
    
    	ip = VTOI(snapvp);
    	fs = ITOFS(ip);
    
    	bp->b_iocmd = BIO_READ;
    	bp->b_iooffset = dbtob(fsbtodb(fs, blkstofrags(fs, lbn)));
    	bp->b_iodone = bdone;
    	g_vfs_strategy(&ITODEVVP(ip)->v_bufobj, bp);
    	bufwait(bp);
    	return (bp->b_error);
    }
    
    Here it uses the buffer that will eventually be written to the disk.
    The g_vfs_strategy() routine uses four parts of the buffer: b_bcount,
    b_iocmd, b_iooffset, and b_data.
    
    The b_bcount field is already correctly set for the buffer. It is
    safe to set the b_iocmd and b_iooffset fields as they are set
    correctly when the later write is done. The write path will also
    clear the B_DONE flag that our use of the buffer will set. The
    b_iodone callback has to be set to bdone() which will do just
    notification that the I/O is done in bufdone(). The rest of
    bufdone() includes things like processing the softdeps associated
    with the buffer should not be done until the buffer has been
    written. Bufdone() will set b_iodone back to NULL after using it,
    so the full bufdone() processing will be done when the buffer is
    written. The final change from the previous version of readblock()
    is that it used the b_data for the destination of the read while
    g_vfs_strategy() uses the bdata2bio() function to take advantage
    of VMIO when it is available.
    
    Differential revision: https://reviews.freebsd.org/D32150
    Reviewed by:  kib, chs
    MFC after:    1 week
    Sponsored by: Netflix
    4a365e86