Platform code differences

From OpenZFS
Jump to: navigation, search

It is useful to have a list of code differences between illumos and other platforms. Please separate changes that are trivially portable to other platforms (mainly illumos) from those that are not. Also, please include information on the rationale for each change.

Contents

ZFS on Linux

Platform independent (portable)

  • Switched from C99 to C89
    • Linux's build system passes -std=gnu89 to GCC.
  • Converted large stack allocations to dynamic allocations
    • Linux has an 8KB stack in comparison to illumos' luxurious 24KB stacks.
  • Constify structures containing function pointers
    • The PaX effort to harden the Linux kernel considers writeable function pointers to be potential exploit targets. They modified the Linux kernel build system to report these as section mismatches. Function pointers were constified as a result.
  • Switched various allocations from KM_SLEEP to KM_PUSHPAGE
    • These were found to occur in code paths critical to swap on zvols. Swap on zvols would readily deadlock until they were changed.
  • Drive Identifier database
    • This belongs in a different layer, but we do not have have the option of modifying the kernel itself, especially older ones. The database can be ported to illumos' sd.conf without little difficulty. Entries can also be ported to a similar database in FreeBSD (although not in the reverse direction).
  • -o ashift= in zpool create/attach/replace commands
    • The sector size determines ashift at vdev creation. This is a manual override that permits the system administrator to workaround drives that lie with relative ease. It complements the drive database.
  • SA based xattrs
    • Improves get/set performance for small xattr values.
    • This would have resulted in a ZFS version change had it been imported into Open Solaris before the project was discontinued. It is off by default.
  • Better queuing of read IOs to leaves of mirror vdevs
    • Improves throughput and IOPS on mirrored vdevs
  • FASTWRITE algorithm
    • Greedy selection of least busy top-level vdev when queuing writes. Improves IOPS performance.
    • Patch being tested to remove mc_fastwrite_lock.

Platform specific (non-portable)

  • Autotools build system
    • This could be adapted to other platforms, but the current code is extremely Linux-specific.
  • ZPIOS
    • Benchmark designed to exercise the the ZFS Transaction Object Layer
    • This could be adapted to other platforms with a rewrite to use illumos interfaces.
  • ZFS POSIX Layer
    • Linux VFS hooks that attempt to wrap the functions (zfs_vnops.c) used on illumos.
  • ZVOL code
    • A roughly 90% rewrite for Linux. Very little code shared with illumos.
  • PF_NOFS thread flag
    • A thread specific flag to indicate that we are in a path that might involve swap. Implemented in compatibility layer.
    • KM_SLEEP allocations made in the presence of PF_NOFS will be converted to KM_PUSHPAGE. A stack trace is also printed to dmesg.
  • cv_wait_io()
    • Compatibility layer extension to hook into Linux's I/O time accounting infrastructure. Otherwise identical to cv_wait().

ZFS-OSX

  • Mostly based on ZFS on Linux
    • Clone of source tree
    • Uses ZFS on Linux autoconf
  • Prefers clang or llvm-gcc for kernel module
    • IOKit kernel module requires C++ (strip Linux's variables named private)
  • All UIO operations are kernel API calls. (uio_create() / uio_setoffset() ... ) the struct is opaque
  • zfs_vnops.c, zfs_vfops.c, zfs_acl.c, zfs_znode.c, zfs_fuid.c are FreeBSD based with OS X wrappers in zfs_vnops_osx.c
  • vnode_t is defined as struct vnode * on OS X, so most variables are now defined as struct vnode *vp
    • All vnode operations have API calls (vnode_fsnode(vp) == vp->v_data). the struct is opaque
  • zvols are mostly untouched from FreeBSD, but calls wrappers to IOKit C++ layer
  • vnode_create needs ALL information at call time (vtype, private ptr (znode), vnode_ops), can not pre-alloc like FreeBSD
    • vnode_create can call both reclaim and fdsync causing locking issues. OS X port has a reclaim thread to defer reclaims.

OSv

  • Based on FreeBSD ZFS
  • Removes jail and geom support
  • vfs integration (ZPL) modified for OSv
  • Thread-local support uses __thread