Multi-device Benchmarks

From btrfs Wiki

Jump to: navigation, search

(Note: The benchmarks in this page were done a lot of time ago, they aren't representative of the current Btrfs codebase)

Contents

Raid Comparison Benchmarking

Btrfs v0.14 brings support for managing multiple devices, including support for raid0, raid1, and raid10. Some benchmarking was done to ensure the implementation performed well in comparison with both hardware and software raid. In general these numbers show that Btrfs does a good job at scaling to this storage configuration, and that is it on par with both HW raid and MD.

There is quite a bit of room for IO scalability improvements however, including spreading the CPU intensive checksumming across more CPUs. The disk format has fields to group chunks and to set a preferred group for specific files. This will allow drives to be split up for parallel access more effectively.

Initial tests on larger IO subsystems show a few different CPU bottlenecks because O_DIRECT support is not yet implemented.

Benchmark Setup

Benchmarking machine was a dual core x86-64 machine with 3ghz CPU. Four sata drives were connected via a 3ware 9650 raid card.

This is a simple throughput test, and so fio was used to run sequential IO across all four drives in different raid configs. XFS was used as a baseline, and roughly represents disk speed.

The benchmark had a few phases:

  • Sequentially create two 4GB files
  • Sequentially read one of the files
  • Read both of the files in parallel

Cache flushes were done between each phase. The graphs below were generated by seekwatcher and capture IO done across all devices during the entire run. In general, all four devices are able to write at 190MB/s and read at 200MB/s.

Btrfs Raid0

Btrfs raid0 is able to run at very close to the limits of the hardware, even though data checksumming is enabled. It should be noted that data checksumming does generate a large amount of metadata, and these can be seen as seeks during the write phases.

Image:multi-benchmark-braid0.png

Raid0 Comparison

Including the full seekwatcher graphs for XFS and Btrfs on top of MD raid0 becomes very confusing. The graph below is just the seek count and throughput portion, but it shows that Btrfs performs well in comparison to software raid0.

Image:multi-benchmark-raid0-compare.png

Btrfs Raid1

Btrfs raid1 dynamically allocates mirrored storage from the pool of devices. The IO graph below shows it writing to two devices at a time, and then reading from all 4 during the read phase. The graph does highlight the current simplistic read balancing code, where Btrfs only operates at 100MB/s while a single process reads from the drive.

Image:multi-benchmark-braid1.png

Btrfs Raid10

Btrfs raid10 stripes across mirrored drives. The graph shows faster performance than raid1, but not quite as good as raid0.

Image:multi-benchmark-braid10.png

Raid10 Comparison

Btrfs raid10 was compared with MD raid10 and with hardware raid10 on the 3ware controller. It isn't listed in the graph below, but XFS was able to make better use of MD raid10 scoring about the same performance as Btrfs-raid10. XFS on HW raid10 was slightly faster than Btrfs on HW raid10.

Image:multi-benchmark-raid10-compare.png

The differences between the HW raid10 and the Btrfs raid10 deserve more detailed examination. Individually, each drive writes at roughly 80MB/s, and so peak throughput should be 4 * 80MB/s. However, this is fairly boring desktop machine, and it is only able to write at 190MB/s to all four drives at once (tested via O_DIRECT to the block devices in parallel).

This partially explains why the write phase of the benchmark completes much faster on the HW raid10. The 3ware card is very fast and has fewer components between it and the drives. The read phase of the benchmark actually completes faster on Btrfs-raid10, which means the simple Btrfs read balancing code got lucky in this workload. The graph below compares Btrfs on HW raid10 and Btrfs-raid10.

Image:multi-benchmark-raid10-compare-2.png

Personal tools