From btrfs Wiki
Revision as of 02:55, 25 April 2013 by Bobpaul (Talk | contribs)

Jump to: navigation, search

What are the crash guarantees of rename

In the "What are the crash guarantees of rename?" section, what if we sync the file to the disk before renaming?

echo "content" > file.tmp

# make sure content is on disk

mv file.tmp file

# *crash*

--Flickmontana 02:17, 14 January 2011 (UTC)

I've just read the linked thread, so I think I can explain. The short answer is that when you add the sync, the guarantee is that at your crash point, file contains either the old contents of file, or "content".
The longer answer follows:
# Start a metadata operation to create file.tmp, and start a data operation to fill 
# file.tmp with "content".
echo "content" > file.tmp
# Wait for previous operations to complete. Outside shell, you could just 
# use fsync( file_tmp_fd ) to wait for the data operation to complete.
# Wait for previous metadata operations on file.tmp to complete, then start a metadata 
# operation to rename file.tmp to file.
mv file.tmp file
# *crash*
Remember that you're building up two queues of operations, metadata and data. The sequence above creates the following metadata queue:
1. Create a file object, currently called "file.tmp".
2. Rename "file.tmp" to "file", losing the old "file".
It also creates the following data queue:
1. In the file object created in metadata operation 1, store the data "content".
The data operation queue can't be processed until metadata operation 1 has created the file object, as it depends on the results of that operation.
Without the sync, these three operations give you three possible sequences to disk:
1. Metadata operation 1 completed
2. Data operation 1 completed
3. Metadata operation 2 completed.
1. Data operation 1 completed
2. Metadata operation 1 completed
3. Metadata operation 2 completed.
1. Metadata operation 1 completed
2. Metadata operation 2 completed
3. Data operation 1 completed.
If the crash happens just before the final operation completes, the first sequence results in file.tmp containing content, and file being unaltered. The second results in file.tmp being present, with data in it, and file being unaltered, while the third results in file being empty, and the data lost. With the sync, the third sequence is illegal; metadata operation 2 isn't even queued for btrfs to think about until metadata operation 1 and data operation 1 have completed. With the fsync, metadata operation 2 won't be queued until data operation 1 has completed, so again, the third option is impossible. -- Farnz
Thank you for the explanation. The reason I asked is that I was concerned about what would happen if my computer crashed when I was renaming a file that already exists on a btrfs formatted partition (for example, photos that I've had for a while). I found this page while searching for the answer, and my initial thought was that renaming existing files on a btrfs partition would put them in mortal danger. I just wanted to make sure that this was not the case, and also to point out that some people might get the wrong impression from reading this page. Btrfs looks very promising, and I would hate for anyone to avoid using it due to a misconception. --Flickmontana 12:13, 15 January 2011 (UTC)

Question on the encryption Q/A

From the article page:

This pretty much forbids you to use btrfs' cool RAID features if you need encryption. Using a RAID implementation on top of several encrypted disks if much slower than using encryption on top of a RAID device. So the RAID implementation must be on a lower layer than the encryption, which is not possible using btrfs' RAID support.

I wonder if this is true, and if so, how true, especially with hardware AES support. In fact, I think have seen a speedup (with ext4) from such a setup (striped LVM over dm-crypt where the physical volumes were on encrypted partitions) on a computer without hardware AES support. This is due to the fact that the dm-crypt implementation in Linux only uses one crypto thread per encrypted device, so single-threaded performance can be a bottleneck. However such a setup is definitely more of a pain to maintain than a single unencrypted volume group containing encrypted logical volumes. --SLi 21:14, 30 May 2012 (UTC)

How much space do I get with unequal devices in RAID-1 mode?

The answer gives two possibilities: either the largest disk is smaller than the sum of the other disks or it's not. Two scenarios are given, 3TB+2TB+2TB resulting in 3.5TB of usable space (The 3TB is mirrored half to each of the 2TBs, then the remaining 500MB on each is mirrored to the other. 3.5TB of data, all duplicated to at least 1 spindle) and 3TB+1TB+1TB resulting in 2TB of usable space (1TB of the 3TB is mirrored to each of the smaller disks, no further operations allowed once mirroring no longer possible.)

I feel like this rule is simplistic and doesn't cover all scenarios. What about the case where none of the disks are the same size? Say, 1TB, 750MB, 500MB. As I look at this, think of it first as the 1TB and 750MB. 250MB is lost per the rule above. Now add the 500MB. The 250MB that was left over can be mirrored to the 500MB, but there's still 250MB free on the 500MB that can't be mirrored anywhere else.

So really the answer is more complicated than the page describes. Does anyone have a good, succinct rule that covers my scenario as well as the two on the page? Or am I missing something obvious and the 1TB, 750MB, and 500MB yield 1.75TB/2=875MB of mirrorable space? Bobpaul 02:55, 25 April 2013 (UTC)

Personal tools