RAID56

From btrfs Wiki
(Difference between revisions)
Jump to: navigation, search
(The history: The history isn't really helpful at this point.)
(Status)
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
== Status ==
 
== Status ==
  
From 3.19, the recovery and rebuild code was integrated. This brings the implementation to the point where it should be usable for most purposes. Since this is new code, you should expect it to stabilize over the next couple of kernel releases.
+
{{Warning|The parity RAID code has a specific issue with regard to data integrity: see "write hole", below. It should ''not'' be used for metadata. For data, it should be safe as long as a scrub is run immediately after any unclean shutdown.}}
  
The one missing piece, from a reliability point of view, is that it is still vulnerable to the parity RAID "write hole", where a partial write as a result of a power failure may result in inconsistent parity data.
+
From 3.19, the recovery and rebuild code was integrated. The one missing piece, from a reliability point of view, is that it is still vulnerable to the parity RAID "write hole", where a partial write as a result of a power failure will result in inconsistent parity data.
  
* Parity may be inconsistent after a crash (the "write hole")
+
* Parity may be inconsistent after a crash (the "write hole"). The problem born when after "an unclean shutdown" a disk failure happens. But these  are *two* distinct failures. These together break the BTRFS raid5 redundancy. If you run a scrub process after "an unclean shutdown" (with no disk failure in between) those data which match their checksum can still be read out while the mismatched data are lost forever.
 +
* <del>Parity data is not checksummed</del> Checksumming for parity is not necessary.See [[Talk:Status#parity_not_checksummed_removed|Talk:Status]]
 
* No support for discard? (possibly -- needs confirmation with cmason)
 
* No support for discard? (possibly -- needs confirmation with cmason)
 
* The algorithm uses as many devices as are available: No support for a fixed-width stripe (see note, below)
 
* The algorithm uses as many devices as are available: No support for a fixed-width stripe (see note, below)
 +
 +
The first two of these problems mean that the parity RAID code is not suitable for any system which might encounter unplanned shutdowns (power failure, kernel lock-up), and it should not be considered production-ready.
  
 
If you'd like to learn btrfs raid5/6 and rebuilds by example (based on kernel 3.14), you can look at [http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html Marc MERLIN's page about btrfs raid 5/6].
 
If you'd like to learn btrfs raid5/6 and rebuilds by example (based on kernel 3.14), you can look at [http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html Marc MERLIN's page about btrfs raid 5/6].
 +
 +
On '''01 Aug 2017''' a RFC patch to fix write hole was posted in the mailing list: [http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg66472.html Btrfs: Add journal for raid5/6 writes]
  
 
== Note ==
 
== Note ==

Latest revision as of 16:39, 31 March 2019

[edit] Status

The parity RAID code has a specific issue with regard to data integrity: see "write hole", below. It should not be used for metadata. For data, it should be safe as long as a scrub is run immediately after any unclean shutdown.

From 3.19, the recovery and rebuild code was integrated. The one missing piece, from a reliability point of view, is that it is still vulnerable to the parity RAID "write hole", where a partial write as a result of a power failure will result in inconsistent parity data.

  • Parity may be inconsistent after a crash (the "write hole"). The problem born when after "an unclean shutdown" a disk failure happens. But these are *two* distinct failures. These together break the BTRFS raid5 redundancy. If you run a scrub process after "an unclean shutdown" (with no disk failure in between) those data which match their checksum can still be read out while the mismatched data are lost forever.
  • Parity data is not checksummed Checksumming for parity is not necessary.See Talk:Status
  • No support for discard? (possibly -- needs confirmation with cmason)
  • The algorithm uses as many devices as are available: No support for a fixed-width stripe (see note, below)

The first two of these problems mean that the parity RAID code is not suitable for any system which might encounter unplanned shutdowns (power failure, kernel lock-up), and it should not be considered production-ready.

If you'd like to learn btrfs raid5/6 and rebuilds by example (based on kernel 3.14), you can look at Marc MERLIN's page about btrfs raid 5/6.

On 01 Aug 2017 a RFC patch to fix write hole was posted in the mailing list: Btrfs: Add journal for raid5/6 writes

[edit] Note

Using as many devices as are available means that there will be a performance issue for filesystems with large numbers of devices. It also means that filesystems with different-sized devices will end up with differing-width stripes as the filesystem fills up, and some space may be wasted when the smaller devices are full.

Both of these issues could be addressed by specifying a fixed-width stripe, always running over exactly the same number of devices. This capability is not yet implemented, though.

Personal tools