Historically, hard drives have had a sector size of 512 bytes. This changed when drives became large enough for such a small sector size to make the overhead of keeping track of these sectors consume too much storage space, making hard drives more expensive to produce than strictly necessary. Many modern drives are tagged as “advanced format” drives; Right now, this means they have a sector size of 4096 bytes (4KiB). This includes most if not all SSDs, and most 2TB+ magnetic drives.
If you create a partition on such a drive without ensuring the partition begins on a physical sector, the device firmware will have to do some “magic” which takes more time than not doing the magic in the first place, resulting in reduced performance. It is therefore important to make sure you align partitions correctly on these devices. I generally align partitions to the 1MiB mark for the sake of being future proof. Even though my current drives have 512B and 4KiB sector sizes, I don’t want to encounter any problems when larger sector sizes are introduced.
Although ZFS can use entire devices without partitioning, I use GPT to partition and label my drives. My labels are generally reference to physical location in the server. For example, Bay1.2 would mean the drive is located in bay one slot two. This makes it so much easier to figure out which drive to replace when the need arise.
ZFS is smart enough to query the underlying device to see how large its sectors are, and use this information to determine the size of its dynamic-width stripes. This is all fine and dandy for as long as the hardware isn’t lying. Sadly, hardware currently lie more often than not. My drives claim to have a logical sector size of 512 bytes (ashift=9 because 2^9=512) while the physical sectors are 4Kib (ashift=12). As such, ZFS will make stripes aligned to 512 bytes. This means stripes will almost always be non-aligned, forcing the underlying device to work its magic which in turn degrades performance.
ZFS does not currently seem to have any way of manually configuring the underlying block size, which means we’ll have to apply a workaround if the drives are lying.
Update 2014-11-23: As of FreeBSD 10.1-RELEASE, there is a new sysctl to force the ashift value of new vdevs:
The zfs(8) filesystem has been updated to allow tuning the minimum “ashift” value when creating new top-level virtual devices (vdevs). To set the minimum ashift value, for example when creating a zpool(8) on “Advanced Format” drives, set the vfs.zfs.min_auto_ashift sysctl(8) accordingly. [r266122]
# Enforce an ashift of at least 12, meaning at least 4KiB blocks sysctl vfs.zfs.min_auto_ashift=12 # Create your zpool, or add new vdevs, as you normally would.
Original entry continues below.
On FreeBSD, you have to create a virtual device which informs ZFS its sector size is that of the physical sector size. The following is exactly how I set up the pool on my prototyping server named Lou – including partitioning. My drives have 4KiB sector size. Your mileage may vary!
# Create the gpt structure on the drives. # Data drives: gpart create -s gpt ada0 gpart create -s gpt ada1 gpart create -s gpt ada2 # Create partitions on data drives gpart add -a 1m -t freebsd-zfs -l Bay1.1 ada0 gpart add -a 1m -t freebsd-zfs -l Bay1.2 ada1 gpart add -a 1m -t freebsd-zfs -l Bay1.3 ada2 # Create virtual devices which define 4K sector size gnop create -S 4k gpt/Bay1.1 gnop create -S 4k gpt/Bay1.2 gnop create -S 4k gpt/Bay1.3 # Create the pool and define some general settings: zpool create LouTank raidz /dev/gpt/Bay1.1.nop /dev/gpt/Bay1.2.nop /dev/gpt/Bay1.3.nop zfs set atime=off LouTank zfs set checksum=fletcher4 LouTank # Export pool and remove virtual devices zpool export LouTank gnop destroy gpt/Bay1.1.nop gnop destroy gpt/Bay1.2.nop gnop destroy gpt/Bay1.3.nop # Import pool. Tell zpool to look for devices in /dev/gpt, in order to keep labels. zpool import -d /dev/gpt LouTank