add local-zfs.rst
content is > 90% same as local-zfs.adoc in pve-docs. adapted the format for .rst fixed some typos and wrote some parts slightly different (wording). Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
This commit is contained in:
parent
2107a5aebc
commit
859fe9c1fb
|
@ -0,0 +1,374 @@
|
|||
ZFS on Linux
|
||||
=============
|
||||
.. code-block:: console.. code-block:: console.. code-block:: console
|
||||
|
||||
ZFS is a combined file system and logical volume manager designed by
|
||||
Sun Microsystems. There is no need for manually compile ZFS modules - all
|
||||
packages are included.
|
||||
|
||||
By using ZFS, its possible to achieve maximum enterprise features with
|
||||
low budget hardware, but also high performance systems by leveraging
|
||||
SSD caching or even SSD only setups. ZFS can replace cost intense
|
||||
hardware raid cards by moderate CPU and memory load combined with easy
|
||||
management.
|
||||
|
||||
General ZFS advantages
|
||||
|
||||
* Easy configuration and management with GUI and CLI.
|
||||
* Reliable
|
||||
* Protection against data corruption
|
||||
* Data compression on file system level
|
||||
* Snapshots
|
||||
* Copy-on-write clone
|
||||
* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
|
||||
* Can use SSD for cache
|
||||
* Self healing
|
||||
* Continuous integrity checking
|
||||
* Designed for high storage capacities
|
||||
* Protection against data corruption
|
||||
* Asynchronous replication over network
|
||||
* Open Source
|
||||
* Encryption
|
||||
|
||||
Hardware
|
||||
---------
|
||||
|
||||
ZFS depends heavily on memory, so you need at least 8GB to start. In
|
||||
practice, use as much you can get for your hardware/budget. To prevent
|
||||
data corruption, we recommend the use of high quality ECC RAM.
|
||||
|
||||
If you use a dedicated cache and/or log disk, you should use an
|
||||
enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
|
||||
increase the overall performance significantly.
|
||||
|
||||
IMPORTANT: Do not use ZFS on top of hardware controller which has its
|
||||
own cache management. ZFS needs to directly communicate with disks. An
|
||||
HBA adapter is the way to go, or something like LSI controller flashed
|
||||
in ``IT`` mode.
|
||||
|
||||
|
||||
|
||||
|
||||
ZFS Administration
|
||||
------------------
|
||||
|
||||
This section gives you some usage examples for common tasks. ZFS
|
||||
itself is really powerful and provides many options. The main commands
|
||||
to manage ZFS are `zfs` and `zpool`. Both commands come with great
|
||||
manual pages, which can be read with:
|
||||
|
||||
.. code-block:: console
|
||||
# man zpool
|
||||
# man zfs
|
||||
|
||||
Create a new zpool
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To create a new pool, at least one disk is needed. The `ashift` should
|
||||
have the same sector-size (2 power of `ashift`) or larger as the
|
||||
underlying disk.
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> <device>
|
||||
|
||||
Create a new pool with RAID-0
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Minimum 1 disk
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> <device1> <device2>
|
||||
|
||||
Create a new pool with RAID-1
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Minimum 2 disks
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
|
||||
|
||||
Create a new pool with RAID-10
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Minimum 4 disks
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
|
||||
|
||||
Create a new pool with RAIDZ-1
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Minimum 3 disks
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
|
||||
|
||||
Create a new pool with RAIDZ-2
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Minimum 4 disks
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
|
||||
|
||||
Create a new pool with cache (L2ARC)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is possible to use a dedicated cache drive partition to increase
|
||||
the performance (use SSD).
|
||||
|
||||
As `<device>` it is possible to use more devices, like it's shown in
|
||||
"Create a new pool with RAID*".
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
|
||||
|
||||
Create a new pool with log (ZIL)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is possible to use a dedicated cache drive partition to increase
|
||||
the performance (SSD).
|
||||
|
||||
As `<device>` it is possible to use more devices, like it's shown in
|
||||
"Create a new pool with RAID*".
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> <device> log <log_device>
|
||||
|
||||
Add cache and log to an existing pool
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you have a pool without cache and log. First partition the SSD in
|
||||
2 partition with `parted` or `gdisk`
|
||||
|
||||
.. important:: Always use GPT partition tables.
|
||||
|
||||
The maximum size of a log device should be about half the size of
|
||||
physical memory, so this is usually quite small. The rest of the SSD
|
||||
can be used as cache.
|
||||
|
||||
.. code-block:: console
|
||||
# zpool add -f <pool> log <device-part1> cache <device-part2>
|
||||
|
||||
|
||||
Changing a failed device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: console
|
||||
# zpool replace -f <pool> <old device> <new device>
|
||||
|
||||
|
||||
Changing a failed bootable device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Depending on how Proxmox Backup was installed it is either using `grub` or `systemd-boot`
|
||||
as bootloader.
|
||||
|
||||
The first steps of copying the partition table, reissuing GUIDs and replacing
|
||||
the ZFS partition are the same. To make the system bootable from the new disk,
|
||||
different steps are needed which depend on the bootloader in use.
|
||||
|
||||
.. code-block:: console
|
||||
# sgdisk <healthy bootable device> -R <new device>
|
||||
# sgdisk -G <new device>
|
||||
# zpool replace -f <pool> <old zfs partition> <new zfs partition>
|
||||
|
||||
.. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed.
|
||||
|
||||
With `systemd-boot`:
|
||||
|
||||
.. code-block:: console
|
||||
# pve-efiboot-tool format <new disk's ESP>
|
||||
# pve-efiboot-tool init <new disk's ESP>
|
||||
|
||||
.. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on
|
||||
bootable disks setup by the {pve} installer since version 5.4. For details, see
|
||||
xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP].
|
||||
|
||||
With `grub`:
|
||||
|
||||
Usually `grub.cfg` is located in `/boot/grub/grub.cfg`
|
||||
|
||||
.. code-block:: console
|
||||
# grub-install <new disk>
|
||||
# grub-mkconfig -o /path/to/grub.cfg
|
||||
|
||||
|
||||
Activate E-Mail Notification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
ZFS comes with an event daemon, which monitors events generated by the
|
||||
ZFS kernel module. The daemon can also send emails on ZFS events like
|
||||
pool errors. Newer ZFS packages ship the daemon in a separate package,
|
||||
and you can install it using `apt-get`:
|
||||
|
||||
.. code-block:: console
|
||||
# apt-get install zfs-zed
|
||||
|
||||
To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
|
||||
favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
|
||||
|
||||
.. code-block:: console
|
||||
ZED_EMAIL_ADDR="root"
|
||||
|
||||
Please note Proxmox Backup forwards mails to `root` to the email address
|
||||
configured for the root user.
|
||||
|
||||
IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All
|
||||
other settings are optional.
|
||||
|
||||
Limit ZFS Memory Usage
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is good to use at most 50 percent (which is the default) of the
|
||||
system memory for ZFS ARC to prevent performance shortage of the
|
||||
host. Use your preferred editor to change the configuration in
|
||||
`/etc/modprobe.d/zfs.conf` and insert:
|
||||
|
||||
.. code-block:: console
|
||||
options zfs zfs_arc_max=8589934592
|
||||
|
||||
This example setting limits the usage to 8GB.
|
||||
|
||||
.. IMPORTANT:: If your root file system is ZFS you must update your initramfs every time this value changes:
|
||||
|
||||
.. code-block:: console
|
||||
# update-initramfs -u
|
||||
|
||||
|
||||
SWAP on ZFS
|
||||
~~~~~~~~~~~
|
||||
|
||||
Swap-space created on a zvol may generate some troubles, like blocking the
|
||||
server or generating a high IO load, often seen when starting a Backup
|
||||
to an external Storage.
|
||||
|
||||
We strongly recommend to use enough memory, so that you normally do not
|
||||
run into low memory situations. Should you need or want to add swap, it is
|
||||
preferred to create a partition on a physical disk and use it as swapdevice.
|
||||
You can leave some space free for this purpose in the advanced options of the
|
||||
installer. Additionally, you can lower the `swappiness` value.
|
||||
A good value for servers is 10:
|
||||
|
||||
.. code-block:: console
|
||||
# sysctl -w vm.swappiness=10
|
||||
|
||||
To make the swappiness persistent, open `/etc/sysctl.conf` with
|
||||
an editor of your choice and add the following line:
|
||||
|
||||
.. code-block:: console
|
||||
vm.swappiness = 10
|
||||
|
||||
.. table:: Linux kernel `swappiness` parameter values
|
||||
:widths:auto
|
||||
========= ============
|
||||
Value Strategy
|
||||
========= ============
|
||||
vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition
|
||||
vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.
|
||||
vm.swappiness = 10 This value is sometimes recommended to improve performance when sufficient memory exists in a system.
|
||||
vm.swappiness = 60 The default value.
|
||||
vm.swappiness = 100 The kernel will swap aggressively.
|
||||
========= ============
|
||||
|
||||
ZFS Compression
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
To activate compression:
|
||||
.. code-block:: console
|
||||
# zpool set compression=lz4 <pool>
|
||||
|
||||
We recommend using the `lz4` algorithm, since it adds very little CPU overhead.
|
||||
Other algorithms such as `lzjb` and `gzip-N` (where `N` is an integer `1-9` representing
|
||||
the compression ratio, 1 is fastest and 9 is best compression) are also available.
|
||||
Depending on the algorithm and how compressible the data is, having compression enabled can even increase
|
||||
I/O performance.
|
||||
|
||||
You can disable compression at any time with:
|
||||
.. code-block:: console
|
||||
# zfs set compression=off <dataset>
|
||||
|
||||
Only new blocks will be affected by this change.
|
||||
|
||||
ZFS Special Device
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Since version 0.8.0 ZFS supports `special` devices. A `special` device in a
|
||||
pool is used to store metadata, deduplication tables, and optionally small
|
||||
file blocks.
|
||||
|
||||
A `special` device can improve the speed of a pool consisting of slow spinning
|
||||
hard disks with a lot of metadata changes. For example workloads that involve
|
||||
creating, updating or deleting a large number of files will benefit from the
|
||||
presence of a `special` device. ZFS datasets can also be configured to store
|
||||
whole small files on the `special` device which can further improve the
|
||||
performance. Use fast SSDs for the `special` device.
|
||||
|
||||
.. IMPORTANT:: The redundancy of the `special` device should match the one of the
|
||||
pool, since the `special` device is a point of failure for the whole pool.
|
||||
|
||||
.. WARNING:: Adding a `special` device to a pool cannot be undone!
|
||||
|
||||
Create a pool with `special` device and RAID-1:
|
||||
|
||||
.. code-block:: console
|
||||
# zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4>
|
||||
|
||||
Adding a `special` device to an existing pool with RAID-1:
|
||||
|
||||
.. code-block:: console
|
||||
# zpool add <pool> special mirror <device1> <device2>
|
||||
|
||||
ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
|
||||
`0` to disable storing small file blocks on the `special` device or a power of
|
||||
two in the range between `512B` to `128K`. After setting the property new file
|
||||
blocks smaller than `size` will be allocated on the `special` device.
|
||||
|
||||
.. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to
|
||||
the `recordsize` (default `128K`) of the dataset, *all* data will be written to
|
||||
the `special` device, so be careful!
|
||||
|
||||
Setting the `special_small_blocks` property on a pool will change the default
|
||||
value of that property for all child ZFS datasets (for example all containers
|
||||
in the pool will opt in for small file blocks).
|
||||
|
||||
Opt in for all file smaller than 4K-blocks pool-wide:
|
||||
|
||||
.. code-block:: console
|
||||
# zfs set special_small_blocks=4K <pool>
|
||||
|
||||
Opt in for small file blocks for a single dataset:
|
||||
|
||||
.. code-block:: console
|
||||
# zfs set special_small_blocks=4K <pool>/<filesystem>
|
||||
|
||||
Opt out from small file blocks for a single dataset:
|
||||
|
||||
.. code-block:: console
|
||||
# zfs set special_small_blocks=0 <pool>/<filesystem>
|
||||
|
||||
Troubleshooting
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Corrupted cachefile
|
||||
|
||||
In case of a corrupted ZFS cachefile, some volumes may not be mounted during
|
||||
boot until mounted manually later.
|
||||
|
||||
For each pool, run:
|
||||
|
||||
.. code-block:: console
|
||||
# zpool set cachefile=/etc/zfs/zpool.cache POOLNAME
|
||||
|
||||
and afterwards update the `initramfs` by running:
|
||||
|
||||
.. code-block:: console
|
||||
# update-initramfs -u -k all
|
||||
|
||||
and finally reboot your node.
|
||||
|
||||
Sometimes the ZFS cachefile can get corrupted, and `zfs-import-cache.service`
|
||||
doesn't import the pools that aren't present in the cachefile.
|
||||
|
||||
Another workaround to this problem is enabling the `zfs-import-scan.service`,
|
||||
which searches and imports pools via device scanning (usually slower).
|
Loading…
Reference in New Issue