ZFS: An explanation of different pool layouts

This post contains some examples and short descriptions of different ZFS pool layouts. I’m assuming you’re already familiar with the features of ZFS. If not, you may want to check out the Wikipedia article on ZFS.

General information

It’s recommended to never have more than 9 drives in a single vdev, as this will have a noticeable performance impact, especially when resilvering. Resilvering may become so slow that it’s likely you’ll lose additional drives while the process is running, potentially causing data loss. It’s therefore recommended to use multiple vdevs in the same pool when you want to use more than 9 drives.

It’s generally not recommended to use hardware raid arrays with ZFS, as these will hide the semantics, preventing ZFS from guaranteeing data integrity. There are exceptions, but that’s a topic for another post. You may however use raid controllers configured in pass-through/JBOD mode. RAID card configured in this mode in combination with a BBU is often recommended for ZFS, because of the advantages of a non-volatile write cache. You should disable read cache on the controller, as ZFS handles this bit itself.

When a vdev loses redundancy, it will still detect bad data, but it won’t be able to perform auto-correction of the data. If there’s bad data on the vdev at the time it loses redundancy, that data is lost. Setting the dataset property ‘copies’ to a value higher than 1 will help reduce the chance of this happening, but it will store the content of that dataset multiple times, thus using more space. Changing this property will not affect already stored data. If the vdev loses one additional device while in a non-redundant state, all its data is considered lost.

Layouts

Stripe

zpool create POOLNAME /dev/gpt/Disk1 /dev/gpt/Disk2
A stripe has no redundancy, but you’ll get to utilize the full capacity of the involved drives. Useful when speed and/or capacity is the only concern.
May be compared to “raid-0”.

zpool create POOLNAME VDEV_A /dev/gpt/Disk1 /dev/gpt/Disk2 (…) VDEV_B /dev/gpt/Disk3 /dev/gpt/Disk4 (…)
This command will create a pool containing two vdevs in stripe mode. This means the total storage capacity of the pool becomes the sum of the vdevs it contains. Its write and read speeds increase, as the load is spread across multiple vdevs. If one vdev dies, the pool will be considered ‘faulty’, and all data is considered lost.

Mirror

zpool create POOLNAME mirror /dev/gpt/Disk1 /dev/gpt/Disk2
You can attach as many drives as you’d like to a mirror. However, the general saying around the internet seems to be that anything beyond three drives is a waste of resources.
Mirrors are quick to resilver, as they don’t require complex algorithms to recreate the lost data.
May be compared to “raid-1”.

A two-way mirror will lose redundancy with one drive loss.
A three-way mirror will lose redundancy with two concurrent drive losses.

Capacity: Smallest drive.
Utilization: (Capacity) / ((capacity of all drives) / 100)%

RaidZ-1

zpool create POOLNAME raidz /dev/gpt/Disk1 /dev/gpt/Disk2 /dev/gpt/Disk3
This vdev type will lose redundancy at the first drive loss.
May be compared to “raid-5”, except there’s no “write-hole”.
I find this ideal for storing things which can be reproduced with little effort.

Capacity: (smallest drive)*((number of drives)-1)
Utilization: Capacity / ((capacity of all drives) / 100)%
Min no. drives: 3

RaidZ-2

zpool create POOLNAME raidz2 /dev/gpt/Disk1 /dev/gpt/Disk2 /dev/gpt/Disk3 /dev/gpt/Disk4 /dev/gpt/Disk5
This vdev type will lose redundancy with two concurrent drive losses.
May be compared to “raid-6”, except there’s no “write-hole”.

Capacity: (smallest drive)*((number of drives)-2)
Utilization: Capacity / ((capacity of all drives) / 100)%
Min no. drives: 5

RaidZ-3

zpool create POOLNAME raidz3 /dev/gpt/Disk1 /dev/gpt/Disk2 /dev/gpt/Disk3 /dev/gpt/Disk4 /dev/gpt/Disk5 /dev/gpt/Disk6 /dev/gpt/Disk7
This vdev type will lose redundancy with three concurrent drive losses.
Resilvering is slow compared to RaidZ and RaidZ-2.

Capacity: (smallest drive)*((number of drives)-3)
Utilization: Capacity / ((capacity of all drives) / 100)%
Min no. drives: 7

Leave a comment