Work in progress by Chris Siden.
Storage of small files in dnode
Work in progress by Matt Ahrens.
Notes from meetings
Brainstorm, 18th September 2013
- immediately pursuable ideas, plus long-term and strategic thoughts.
Inter-platform coordination ideas
Ideas for projects that would help coordinate changes between platforms …
Mechanism for pull changes from one place to another
Make it easier to build, test, code review, and integrate ZFS changes into illumos.
Cross-platform test suite
One sourcebase, rather than porting STF to every platform?
Maybe integrate XFS Test Suite.
We already have ztest / libzpool and want to:
- expand this to also be able to test more of zfs in userland
- be able to run /sbin/zfs, /sbin/zpool against userland implementation
- be able to run most of testrunner (and/or STF) test suite against userland implementation
ZFS (ZPL) version feature flags
Import ZFS on Linux sa=xattr into illumos.
/dev/zfs ioctl interface versioning
Ensure that future additions/changes to the interface maintain maximum compatibility with userland tools.
Enable FreeBSD Linux jails / illumos lx brandz to use ZFS on Linux utilities.
Port ZPIOS from ZFS on Linux to illumos
This would require a rewrite to not use Linux interfaces.
Virtual machine images with OpenZFS
To easily try OpenZFS on a choice of distributions within a virtual machine:
- images could be built for running on public clouds
- images for installing to real hardware.
General feature ideas
Possible Channel Programs:
- Recursive rollback (revert to a snapshot on dataset and all children, needs a new command line flag, -r is already taken)
Based on indirect vdevs, rather than bprewrite.
Unified ashift handling
[illumos-zfs] Specifying ashift when creating vdevs (2013-07-03)
Preferably compatible with pool version 32, as pool-feature-flag.
RAID-Z hybrid allocator
Preferably compatible with pool version 29 for Solaris 10u11 compatibility.
Replace larger ZIO caches with explicit pages
Subproject: document useful kernel interfaces for page manipulation on various platforms
Improved SPA namespace collision management
Needed mostly by virtual machine hosts. Work in progress in Gentoo.
Temporary pool names in zpool import
Temporary pool names in zpool create.
FreeBSD already has realtime TRIM support Saso has implementation for Illumos @ Nexenta which he hopes to upstream in the next month or two (2015-10-08)
For more info see: http://www.open-zfs.org/wiki/Features#TRIM_Support
Free space TRIM
- walk metaslab space maps and issue discard commands to the vdevs.
Platform agnostic encryption support
Preferably compatible with pool version 30, as pool-feature-flag.
Developer resources include a link to a November 2010 blog post by Oracle.
The early ZFS encryption code published in the zfs-crypto repository of OpenSolaris.org could be a starting point. A copy is available from Richard Yao upon request.
Convert synchronous writes to asynchronous writes when an ARC miss occurs during a lookup against the DDT.
Use dedicated kmem_cache for deduplication table entries:
- easy to implement
- will reduce DDT entries from 512-bytes to 320-bytes.
ZFS Compression / Dedup to favour provider
Currently, as a storage provider, if a customer has 100MB of quota available, and upload 50MB of data which compresses/dedups to 25MB. The customer's quota is only reduced by 25MB. The reward favours the customer. It is desirable as a provider, to be able to reverse this logic such that the customer's quota is reduced by 50MB and the 25MB compression/dedup saved, is to the provider's benefit. Similar to how Google/Amazon/Cloud-Feature.acme already handles it. You get 2G of quota, and any compression saved is to Google's benefit.
- property(?) to charge quota usage by before-compression-dedup size.
Periodic Data Validation
Problem: ZFS does a great job detecting data errors due to lost writes, media, errors, storage bugs, but only when the user actually accesses the data. Scrub in its current form can take a very long time and can have highly deleterious impacts to overall performance.
Data validation in ZFS should be specified according to data or business needs. Kicking off a scrub every day, week, or month doesn’t directly express that need. More likely, the user wants to express their requirements like this:
- “Check all old data at least once per month”
- “Make sure all new writes are verified within 1 day”
- “Don’t consume more than 50% of my IOPS capacity”
Note that constraints like these may overlap, but that’s fine — the user just must indicate priority and the system must alert the user of violations.
I suggest a new type of scrub. Constraints should be expressed and persisted with the pool. Execution of the scrub should tie into the ZFS IO scheduler. That subsystem is ideally situated to identify a relatively idle system. Further, we should order scrub IOs to be minimally impactful. That may mean having a small queue of outstanding scrub IOs that we’d send to the device, or it might mean that we try to organize large, dense contiguous scrub reads by sorting by LBA.
Further, after writing data to disk, there’s a window for repair while the data is still in the ARC. If ZFS could read that data back, then it could not only detect the failure, but correct it even in a system without redundant on-disk data.
Lustre feature ideas
The Lustre project supports the use of ZFS as an Object Storage Target. They maintain their own feature request page with ZFS project ideas. Below is a list of project ideas that are well defined, benefit Lustre and have no clear benefit outside of that context.
Collapsible ZAP objects
E.g. fatzap -> microzap downgrades.
Data on separate devices
… awareness of the quality, utility, and availability of open source implementations of ZFS.
Please add or discuss your ideas.
ZFS and OpenZFS in three minutes (or less)
A very short and light video/animation to grab the attention of people who don't yet realise why ZFS is an extraordinarily good thing.
For an entertaining example of how a little history (completely unrelated to storage) can be taught in ninety seconds, see Hohenwerfen Fortress - The Imprisoned Prince Bishop (context) (part of the ZONE Media portfolio).
A very short video for ZFS and OpenZFS might throw in all that's good, using plain english wherever possible, including:
- very close to the beginning, the word resilience
- verifiable integrity of data and so on
- some basic comparisons (NTFS, HFS Plus, ReFS)
– with the 2010 fork in the midst but (blink and you'll miss that milestone) the lasting impression from the video is that ZFS is great (years ahead of the alternatives) and OpenZFS is rapidly making it better for a broader user base.
Hint: there exist many ZFS-related videos but many are a tad dry, and cover a huge amount of content. Aim for two minutes :-) … discuss…
Please add or discuss your ideas.
https://twitter.com/DeirdreS/status/322422786184314881 (2013-02) draws attention to ZFS-related content amongst videos listed by Deirdré Straughan.