Delphix Brainstorming

September 18, 2013
Leading up to Delphix's semi-annual Engineering Kick Off (EKO), Delphix employees held a ZFS brainstorming meeting. Below are ideas that came from that meeting, ranging from ideas that can be pursued immediately to more long-term and strategic thoughts.

Bold indicates high-priority items. (?) indicates more investigation needed before defining a project.


 * ZFS self-tuning
 * Estimate performance, consumption after destroys
 * Predict performance improvement of freeing up space
 * Ping-pong write, re-use blocks for the same file as it gets re-written
 * Investigate effectiveness of pre-fetch (?)
 * is lock contention a problem?
 * Pure Storage collaboration
 * Performance tests with compression
 * Using compression histograms gathered from customers
 * Missing metrics on ZFS performance
 * DMU changes for async read, larger block sizes
 * Trim (issuing trim commands to the storage)
 * Write performance
 * multi-block ZIL writer (for RAC) (?)
 * How many metaslabs (?)
 * DTrace provider
 * Compressed ARC - George Wilson
 * ARC observability
 * ARC sizing stats
 * Hit rate, theoretical optimal hit rate based on ghost lists, projection of hit rate given more memory
 * Channel programs
 * LUN removal
 * Cross-pool cloning / distributed DSL
 * Shadow replication, shadow blocks
 * Streaming replication, send out blocks in syncing context
 * Lightweight replication, remove some responsibility from app stack
 * one pool/different vdev “classes”?
 * Resumable send - Max Grossman
 * Compressed send(?)
 * Data masking / block differencing (?)
 * Efficiently store transformed data using a bit function from original data to transformed data
 * Pool fragmentation analytics
 * Provide feedback on when to add storage
 * Provide “% fragmented” metric
 * Data rebalancing/redistribution/defrag/placement
 * Do we care about defrag on SSD?
 * Testing
 * Adding coverage for new features
 * Realistic compression ratios in testing
 * Automated tests for every change
 * Better userland test coverage (full stack from IOCTL down)
 * Better performance tests
 * Scrub should be better (some kind of SLA)
 * Some kind of guarantee on data corruption, how quickly it will be caught?
 * Better error reporting
 * zfs scrub (on specific filesystems)
 * LBA ordered traversals
 * pause/resume scrub (already have stop/restart)
 * Fix broken blkptrs (automated?)
 * Include estimates on reconstruction time
 * Dataset property that sets owner of all contained files in constant time