Difference between revisions of "Documentation/Administrative Commands"

From OpenZFS
Jump to navigation Jump to search
(Created page with "This webpage describes the code flow when doing zfs administrative commands (/sbin/zfs subcommands that change state). We will look at the example of <code>zfs snapshot -r</c...")
 
Line 1: Line 1:
This webpage describes the code flow when doing zfs administrative commands (/sbin/zfs subcommands that change state).  We will look at the example of <code>zfs snapshot -r</code> and examine what each layer of code is responsible for.  This is intended as an introduction to the many layers of ZFS, so we won't go into detail on how snapshots are implemented.  You can read more about snapshots in an [https://blogs.oracle.com/ahrens/entry/is_it_magic old blog post].
This webpage describes the code flow when doing zfs administrative commands (/sbin/zfs subcommands that change state).  We will look at the example of <code>zfs snapshot -r</code> and examine what each layer of code is responsible for.  This is intended as an introduction to the many layers of ZFS, so we won't go into detail on how snapshots are implemented.  You can read more about snapshots in an [https://blogs.oracle.com/ahrens/entry/is_it_magic old blog post].


== /sbin/zfs infrastructure ==
== /sbin/zfs infrastructure (<code>main()</code> ==
The generic (subcommand agnostic) infrastructure of the zfs command does the following:
The generic (subcommand agnostic) infrastructure of the zfs command does the following:
* Create a libzfs handle (libzfs_init()).
* Create a libzfs handle (libzfs_init()).
* Determine which subcommand should be executed and run it.
* Determine which subcommand should be executed and run it.
** Each zfs subcommand has a callback, typically named <code>zfs_do_''subcommand-name''</code>.
** Each zfs subcommand has a callback, typically named <code>zfs_do_''subcommand-name''()</code>.
* Call a libzfs function (zpool_log_history()) to log the command (see below for details)  
* Call a libzfs function (zpool_log_history()) to log the command (see below for details)  


== snapshot subcommand ==
== snapshot subcommand (<code>zfs_do_snapshot</code>) ==
The snapshot subcommand's callback is <code>zfs_do_snapshot</code>.  It does the following:
The snapshot subcommand's callback is <code>zfs_do_snapshot</code>.  It does the following:
* Parse the command line arguments.
* Parse the command line arguments.
Line 15: Line 15:
* Call a libzfs function (zfs_snapshot_nvl()) to create the snapshots and handle any errors.
* Call a libzfs function (zfs_snapshot_nvl()) to create the snapshots and handle any errors.
== libzfs ==
== libzfs ==
We saw two uses of libzfs: iterating over the descendent filesystems, and creating the snapshots.
We saw two uses of libzfs: '''iterating''' over the descendent filesystems, and '''creating''' the snapshots.
=== filesystem iteration ===
 
=== snapshot creation ===
=== filesystem iteration (<code>zfs_iter_filesystems()</code>) ===
== libzfs_core ==
libzfs provides "handles" to zfs datasets, represented by a zfs_handle_t.  The handle is created by getting stats on a dataset from the kernel.  The handle then caches these stats (e.g. property values) in userland.  Note that the handle is a purely userland (libzfs) concept; the kernel doesn't know about them, and the handle doesn't prevent any concurrent activity (e.g. destroying the dataset, changing properties, etc).
== ioctl infrastructure ==
 
== snapshot ioctl ==
To iterate over a filesystem's children, libzfs uses the ZFS_IOC_DATASET_LIST_NEXT ioctl to the kernel.  Each call to this ioctl returns the next child of the specified dataset, along with the stats (e.g. properties) of that dataset.  libzfs uses this information to make a zfs_handle_t, and passes the handle to a callback provided by the caller (zfs_do_snapshot in this case).
== DSL ==
 
== synctask infrastructure ==
=== snapshot creation (<code>zfs_snapshot_nvl()</code>) ===
== snapshot synctask ==
/sbin/zfs provides the list of snapshots to create, so this is a relatively thin layer in libzfs.  Other subcommands have substantially more of their logic implemented in libzfs.  The one interesting part of what libzfs does here is handle the errors from the kernel.  It will print out human-readable error messages depending on what the error code was from the kernel.
== MOS sync ==
 
libzfs calls into libzfs_core to do the actual ioctl to the kernel.
 
== libzfs_core (<code>lzc_snapshot()</code>) ==
libzfs calls <code>lzc_snapshot()</code> in libzfs_core.  libzfs_core is a very thin layer which basically just marshals the arguments and calls the ioctl to the kernel.  In this case it would be ZFS_IOC_SNAPSHOT.
 
== ioctl infrastructure (<code>zfsdev_ioctl()</code>) ==
When userland calls <code>ioctl()</code> on /dev/zfs, the kernel infrastructure will call <code>zfsdev_ioctl()</code>.  This has code which is applied to all zfs ioctls.  It does the following:
* marshals the arguments, copying them in from the user address space
* determines which ioctl function should be called
** specific ioctl functions are typically named <code>zfs_ioc_''name-of-ioct''l()</code>
** ioctl functions are stored in <code>zfs_ioc_vec[]</code>, which is populated by calling <code>zfs_ioctl_register()</code> from <code>zfs_ioctl_init()</code>
* Call the ioctl-specific permission checking function, <code>zfs_secpolicy_snapshot()</code>
* Call the ioctl-specific function, <code>zfs_ioc_snapshot()</code>
* If this is a new-style ioctl (which ZFS_IOC_SNAPSHOT is), and it was successful, we log the ioctl and its arguments on disk in the pool's history.
** This history log can be printed by running <tt>zpool history -i</tt>
* If this ioctl allows it (which ZFS_IOC_SNAPSHOT does), and the ioctl was successful, we remember that this thread is allowed to log the CLI history, which will be done as a separate ioctl (see below).
 
== snapshot ioctl (<code>zfs_ioc_snapshot()</code>) ==
This is a relatively thin layer which typically checks that the arguments are well-formed.  E.g. all of the snapshots must be in the same pool.
 
== DSL (<code>dsl_dataset_snapshot()</code>) ==
This layer is also relatively thin, it marshals its arguments into structs and creates a synctask to execute a callback from syncing context.  For snapshots, there is also some code to suspend the ZIL on old version pools.
 
== synctask infrastructure (<code>dsl_sync_task()</code>) ==
The synctask infrastructure allows for a thread executing in open context (i.e. from an ioctl()) to execute a callback in syncing context (i.e. from <code>spa_sync()</code>).  The MOS (Meta Object Set), which contains all the pool-wide metadata, can only be modified from syncing context.
 
== snapshot synctask (<code>dsl_dataset_snapshot_sync()</code>) ==
This code creates each of the snapshots, by modifying the MOS.  Specifically, by creating a new object in the MOS to represent each snapshot.  Each snapshot's object stores a <code>dsl_dataset_phys_t</code>, which will be filled in by this code.  Related datasets will also be modified.  Note that in this phase, we are only modifying and dirtying the in-memory copy of data that will be
 
== DMU layer: MOS sync (<code>dmu_objset_sync()</code>) ==
In the previous phase, we only modified the in-memory copy of data which is represented on disk.  Subsequently (in the same TXG), we write out all the dirty data in the MOS.  The MOS is a object set (objset) like any other, the primary difference being that it is only dirtied (modified) in syncing context.
 
== CLI history logging ==
== CLI history logging ==

Revision as of 18:54, 13 September 2013

This webpage describes the code flow when doing zfs administrative commands (/sbin/zfs subcommands that change state). We will look at the example of zfs snapshot -r and examine what each layer of code is responsible for. This is intended as an introduction to the many layers of ZFS, so we won't go into detail on how snapshots are implemented. You can read more about snapshots in an old blog post.

/sbin/zfs infrastructure (main()

The generic (subcommand agnostic) infrastructure of the zfs command does the following:

  • Create a libzfs handle (libzfs_init()).
  • Determine which subcommand should be executed and run it.
    • Each zfs subcommand has a callback, typically named zfs_do_subcommand-name().
  • Call a libzfs function (zpool_log_history()) to log the command (see below for details)

snapshot subcommand (zfs_do_snapshot)

The snapshot subcommand's callback is zfs_do_snapshot. It does the following:

  • Parse the command line arguments.
  • Create a list of the snapshots that need to be created.
    • Call a libzfs function (zfs_iter_filesystems()) to iterate over the descendent filesystems, adding the snapshot of that filesystem to the list
  • Call a libzfs function (zfs_snapshot_nvl()) to create the snapshots and handle any errors.

libzfs

We saw two uses of libzfs: iterating over the descendent filesystems, and creating the snapshots.

filesystem iteration (zfs_iter_filesystems())

libzfs provides "handles" to zfs datasets, represented by a zfs_handle_t. The handle is created by getting stats on a dataset from the kernel. The handle then caches these stats (e.g. property values) in userland. Note that the handle is a purely userland (libzfs) concept; the kernel doesn't know about them, and the handle doesn't prevent any concurrent activity (e.g. destroying the dataset, changing properties, etc).

To iterate over a filesystem's children, libzfs uses the ZFS_IOC_DATASET_LIST_NEXT ioctl to the kernel. Each call to this ioctl returns the next child of the specified dataset, along with the stats (e.g. properties) of that dataset. libzfs uses this information to make a zfs_handle_t, and passes the handle to a callback provided by the caller (zfs_do_snapshot in this case).

snapshot creation (zfs_snapshot_nvl())

/sbin/zfs provides the list of snapshots to create, so this is a relatively thin layer in libzfs. Other subcommands have substantially more of their logic implemented in libzfs. The one interesting part of what libzfs does here is handle the errors from the kernel. It will print out human-readable error messages depending on what the error code was from the kernel.

libzfs calls into libzfs_core to do the actual ioctl to the kernel.

libzfs_core (lzc_snapshot())

libzfs calls lzc_snapshot() in libzfs_core. libzfs_core is a very thin layer which basically just marshals the arguments and calls the ioctl to the kernel. In this case it would be ZFS_IOC_SNAPSHOT.

ioctl infrastructure (zfsdev_ioctl())

When userland calls ioctl() on /dev/zfs, the kernel infrastructure will call zfsdev_ioctl(). This has code which is applied to all zfs ioctls. It does the following:

  • marshals the arguments, copying them in from the user address space
  • determines which ioctl function should be called
    • specific ioctl functions are typically named zfs_ioc_name-of-ioctl()
    • ioctl functions are stored in zfs_ioc_vec[], which is populated by calling zfs_ioctl_register() from zfs_ioctl_init()
  • Call the ioctl-specific permission checking function, zfs_secpolicy_snapshot()
  • Call the ioctl-specific function, zfs_ioc_snapshot()
  • If this is a new-style ioctl (which ZFS_IOC_SNAPSHOT is), and it was successful, we log the ioctl and its arguments on disk in the pool's history.
    • This history log can be printed by running zpool history -i
  • If this ioctl allows it (which ZFS_IOC_SNAPSHOT does), and the ioctl was successful, we remember that this thread is allowed to log the CLI history, which will be done as a separate ioctl (see below).

snapshot ioctl (zfs_ioc_snapshot())

This is a relatively thin layer which typically checks that the arguments are well-formed. E.g. all of the snapshots must be in the same pool.

DSL (dsl_dataset_snapshot())

This layer is also relatively thin, it marshals its arguments into structs and creates a synctask to execute a callback from syncing context. For snapshots, there is also some code to suspend the ZIL on old version pools.

synctask infrastructure (dsl_sync_task())

The synctask infrastructure allows for a thread executing in open context (i.e. from an ioctl()) to execute a callback in syncing context (i.e. from spa_sync()). The MOS (Meta Object Set), which contains all the pool-wide metadata, can only be modified from syncing context.

snapshot synctask (dsl_dataset_snapshot_sync())

This code creates each of the snapshots, by modifying the MOS. Specifically, by creating a new object in the MOS to represent each snapshot. Each snapshot's object stores a dsl_dataset_phys_t, which will be filled in by this code. Related datasets will also be modified. Note that in this phase, we are only modifying and dirtying the in-memory copy of data that will be

DMU layer: MOS sync (dmu_objset_sync())

In the previous phase, we only modified the in-memory copy of data which is represented on disk. Subsequently (in the same TXG), we write out all the dirty data in the MOS. The MOS is a object set (objset) like any other, the primary difference being that it is only dirtied (modified) in syncing context.

CLI history logging