Difference between revisions of "Platform code differences"

From OpenZFS
Jump to navigation Jump to search
m (Clarify xattr=sa)
(ZoL uses C99 from https://github.com/zfsonlinux/zfs/commit/06480b2790e4a07fefa5a2bbaaf1e26d1eb00d33)
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
It is useful to have a list of code differences between Illumos and other platforms. Please separate changes that are trivially portable to other platforms (mainly Illumos) from those that are not. Also, please include information on the rationale for each change.  
It is useful to have a list of code differences between illumos and other platforms. Please separate changes that are trivially portable to other platforms (mainly illumos) from those that are not. Also, please include information on the rationale for each change.  


= ZFSOnLinux =
== ZFS on Linux ==
 
=== Platform independent (portable) ===


== Platform Independent (portable) ==
* Switched from C99 to C89
** Linux's build system passes -std=gnu89 to GCC.
* Converted large stack allocations to dynamic allocations
* Converted large stack allocations to dynamic allocations
** Linux has an 8KB stack in comparison to Illumos' luxurious 24KB stacks.
** Linux has an 8KB stack in comparison to illumos' luxurious 24KB stacks.
* [https://github.com/zfsonlinux/zfs/commit/b01615d5ac86913da1e092d0378bfb8f0e72af30 Constify structures containing function pointers]
* [https://github.com/zfsonlinux/zfs/commit/b01615d5ac86913da1e092d0378bfb8f0e72af30 Constify structures containing function pointers]
** The [http://pax.grsecurity.net/ PaX] effort to harden the Linux kernel considers writeable function pointers to be potential exploit targets. They modified the Linux kernel build system to report these as section mismatches. Function pointers were constified as a result.
** The [http://pax.grsecurity.net/ PaX] effort to harden the Linux kernel considers writeable function pointers to be potential exploit targets. They modified the Linux kernel build system to report these as section mismatches. Function pointers were constified as a result.
Line 13: Line 12:
** These were found to occur in code paths critical to swap on zvols. Swap on zvols would readily deadlock until they were changed.
** These were found to occur in code paths critical to swap on zvols. Swap on zvols would readily deadlock until they were changed.
* [https://github.com/zfsonlinux/zfs/commit/bff32e0972bbc07ba5f2b9ce5b965813d8edcf78 Drive Identifier database]
* [https://github.com/zfsonlinux/zfs/commit/bff32e0972bbc07ba5f2b9ce5b965813d8edcf78 Drive Identifier database]
** This belongs in a different layer, but we do not have have the option of modifying the kernel itself, especially older ones. The database can be ported to Illumos' sd.conf without little difficulty. Entries can also be ported to a similar database in FreeBSD (although not in the reverse direction).
** This belongs in a different layer, but we do not have have the option of modifying the kernel itself, especially older ones. The database can be ported to illumos' sd.conf without little difficulty. Entries can also be ported to a similar database in FreeBSD (although not in the reverse direction).
* -o ashift= in zpool create/attach/replace commands
* -o ashift= in zpool create/attach/replace commands
** The sector size determines ashift at vdev creation. This is a manual override that permits the system administrator to workaround drives that lie with relative ease. It complements the drive database.
** The sector size determines ashift at vdev creation. This is a manual override that permits the system administrator to workaround drives that lie with relative ease. It complements the drive database.
Line 25: Line 24:
** [https://github.com/ryao/zfs/commit/858822a04b4563657b2267131e90d9687d67e31b Patch] being tested to remove mc_fastwrite_lock.
** [https://github.com/ryao/zfs/commit/858822a04b4563657b2267131e90d9687d67e31b Patch] being tested to remove mc_fastwrite_lock.


== Platform Specific (non-portable) ==
=== Platform specific (non-portable) ===
 
* Autotools build system
* Autotools build system
** This could be adapted to other platforms, but the current code is extremely Linux-specific.
** This could be adapted to other platforms, but the current code is extremely Linux-specific.
* ZPIOS
* ZPIOS
** Benchmark designed to exercise the the ZFS Transaction Object Layer
** Benchmark designed to exercise the the ZFS Transaction Object Layer
** This could be adapted to other platforms with a rewrite to use Illumos interfaces.
** This could be adapted to other platforms with a rewrite to use illumos interfaces.
* ZFS POSIX Layer
* ZFS POSIX Layer
** Linux VFS hooks that attempt to wrap the functions (zfs_vnops.c) used on Illumos.
** Linux VFS hooks that attempt to wrap the functions (zfs_vnops.c) used on illumos.
* ZVOL code
* ZVOL code
** A roughly 90% rewrite for Linux. Very little code shared with Illumos.
** A roughly 90% rewrite for Linux. Very little code shared with illumos.
* [https://github.com/zfsonlinux/spl/commit/eb0f407a2b9089113ef6f2402ebd887511315b43 PF_NOFS thread flag]
* [https://github.com/zfsonlinux/spl/commit/eb0f407a2b9089113ef6f2402ebd887511315b43 PF_NOFS thread flag]
** A thread specific flag to indicate that we are in a path that might involve swap. Implemented in compatibility layer.
** A thread specific flag to indicate that we are in a path that might involve swap. Implemented in compatibility layer.
Line 40: Line 40:
* [https://github.com/zfsonlinux/spl/commit/46a75aadb7c08085a4ad2e55dcf5b6fb387c1253 cv_wait_io()]
* [https://github.com/zfsonlinux/spl/commit/46a75aadb7c08085a4ad2e55dcf5b6fb387c1253 cv_wait_io()]
** Compatibility layer extension to hook into Linux's I/O time accounting infrastructure. Otherwise identical to cv_wait().
** Compatibility layer extension to hook into Linux's I/O time accounting infrastructure. Otherwise identical to cv_wait().
== OpenZFS on OS X ==
* Mostly based on ZFS on Linux
** Clone of source tree
** Uses ZFS on Linux autoconf
* Prefers clang or llvm-gcc for kernel module
** IOKit kernel module requires C++ (strip Linux's variables named ''private'')
* All UIO operations are kernel API calls. (uio_create() / uio_setoffset() ... ) the struct is opaque
* zfs_vnops.c, zfs_vfops.c, zfs_acl.c, zfs_znode.c, zfs_fuid.c are FreeBSD based with OS X wrappers in zfs_vnops_osx.c
* '''vnode_t''' is defined as '''struct vnode *''' on OS X, so most variables are now defined as '''struct vnode *vp'''
** All vnode operations have API calls (vnode_fsnode(vp) == vp->v_data). the struct is opaque
* zvols are mostly untouched from FreeBSD, but calls wrappers to IOKit C++ layer
* '''vnode_create''' needs ALL information at call time (vtype, private ptr (znode), vnode_ops), can not pre-alloc like FreeBSD
** '''vnode_create''' can call both '''reclaim''' and '''fdsync''' causing locking issues. OS X port attaches vnode ptr after VNOP's dmu_tx_commit is called to ensure no ZFS locks are held entering the VFS layer.
== OSv ==
* Based on FreeBSD ZFS
* Removes jail and geom support
* vfs integration (ZPL) modified for OSv
* Thread-local support uses __thread

Latest revision as of 19:42, 27 June 2017

It is useful to have a list of code differences between illumos and other platforms. Please separate changes that are trivially portable to other platforms (mainly illumos) from those that are not. Also, please include information on the rationale for each change.

ZFS on Linux

Platform independent (portable)

  • Converted large stack allocations to dynamic allocations
    • Linux has an 8KB stack in comparison to illumos' luxurious 24KB stacks.
  • Constify structures containing function pointers
    • The PaX effort to harden the Linux kernel considers writeable function pointers to be potential exploit targets. They modified the Linux kernel build system to report these as section mismatches. Function pointers were constified as a result.
  • Switched various allocations from KM_SLEEP to KM_PUSHPAGE
    • These were found to occur in code paths critical to swap on zvols. Swap on zvols would readily deadlock until they were changed.
  • Drive Identifier database
    • This belongs in a different layer, but we do not have have the option of modifying the kernel itself, especially older ones. The database can be ported to illumos' sd.conf without little difficulty. Entries can also be ported to a similar database in FreeBSD (although not in the reverse direction).
  • -o ashift= in zpool create/attach/replace commands
    • The sector size determines ashift at vdev creation. This is a manual override that permits the system administrator to workaround drives that lie with relative ease. It complements the drive database.
  • SA based xattrs
    • Improves get/set performance for small xattr values.
    • This would have resulted in a ZFS version change had it been imported into Open Solaris before the project was discontinued. It is off by default.
  • Better queuing of read IOs to leaves of mirror vdevs
    • Improves throughput and IOPS on mirrored vdevs
  • FASTWRITE algorithm
    • Greedy selection of least busy top-level vdev when queuing writes. Improves IOPS performance.
    • Patch being tested to remove mc_fastwrite_lock.

Platform specific (non-portable)

  • Autotools build system
    • This could be adapted to other platforms, but the current code is extremely Linux-specific.
  • ZPIOS
    • Benchmark designed to exercise the the ZFS Transaction Object Layer
    • This could be adapted to other platforms with a rewrite to use illumos interfaces.
  • ZFS POSIX Layer
    • Linux VFS hooks that attempt to wrap the functions (zfs_vnops.c) used on illumos.
  • ZVOL code
    • A roughly 90% rewrite for Linux. Very little code shared with illumos.
  • PF_NOFS thread flag
    • A thread specific flag to indicate that we are in a path that might involve swap. Implemented in compatibility layer.
    • KM_SLEEP allocations made in the presence of PF_NOFS will be converted to KM_PUSHPAGE. A stack trace is also printed to dmesg.
  • cv_wait_io()
    • Compatibility layer extension to hook into Linux's I/O time accounting infrastructure. Otherwise identical to cv_wait().

OpenZFS on OS X

  • Mostly based on ZFS on Linux
    • Clone of source tree
    • Uses ZFS on Linux autoconf
  • Prefers clang or llvm-gcc for kernel module
    • IOKit kernel module requires C++ (strip Linux's variables named private)
  • All UIO operations are kernel API calls. (uio_create() / uio_setoffset() ... ) the struct is opaque
  • zfs_vnops.c, zfs_vfops.c, zfs_acl.c, zfs_znode.c, zfs_fuid.c are FreeBSD based with OS X wrappers in zfs_vnops_osx.c
  • vnode_t is defined as struct vnode * on OS X, so most variables are now defined as struct vnode *vp
    • All vnode operations have API calls (vnode_fsnode(vp) == vp->v_data). the struct is opaque
  • zvols are mostly untouched from FreeBSD, but calls wrappers to IOKit C++ layer
  • vnode_create needs ALL information at call time (vtype, private ptr (znode), vnode_ops), can not pre-alloc like FreeBSD
    • vnode_create can call both reclaim and fdsync causing locking issues. OS X port attaches vnode ptr after VNOP's dmu_tx_commit is called to ensure no ZFS locks are held entering the VFS layer.

OSv

  • Based on FreeBSD ZFS
  • Removes jail and geom support
  • vfs integration (ZPL) modified for OSv
  • Thread-local support uses __thread