Scrub/Resilver Performance

Saso Kiselkov of Nexenta gave a talk on Scrub/Resilver Performance at the OpenZFS Developer Summit 2016:

Video, Slides

Since its inception, ZFS has included a facility to both preemptively check the correctness of stored data, as well as to recover from drive failures. This mechanism, however, has always been fairly inefficient and with the growth of hard-drives, it is common to see resilver operations taking days to weeks. This is primarily inherent to the design of the resilver algorithm, which simply traverses the filesystem tree structures, rather than take into consideration that disks significantly prefer disk block-ordered sequential access.

This talk discusses some forthcoming work for OpenZFS, where we have achieved performance improvements on resilver by implementing an intelligent block sorting pre-scanner ahead of the regular resilver repair code. This allows us to reorder I/O in such a way as to achieve near sequential resilver throughput even with limited memory resource usage.