From btrfs Wiki
Revision as of 21:14, 30 May 2012 by SLi (Talk | contribs)

Jump to: navigation, search

What are the crash guarantees of rename

In the "What are the crash guarantees of rename?" section, what if we sync the file to the disk before renaming?

echo "content" > file.tmp

# make sure content is on disk

mv file.tmp file

# *crash*

--Flickmontana 02:17, 14 January 2011 (UTC)

I've just read the linked thread, so I think I can explain. The short answer is that when you add the sync, the guarantee is that at your crash point, file contains either the old contents of file, or "content".
The longer answer follows:
# Start a metadata operation to create file.tmp, and start a data operation to fill 
# file.tmp with "content".
echo "content" > file.tmp
# Wait for previous operations to complete. Outside shell, you could just 
# use fsync( file_tmp_fd ) to wait for the data operation to complete.
# Wait for previous metadata operations on file.tmp to complete, then start a metadata 
# operation to rename file.tmp to file.
mv file.tmp file
# *crash*
Remember that you're building up two queues of operations, metadata and data. The sequence above creates the following metadata queue:
1. Create a file object, currently called "file.tmp".
2. Rename "file.tmp" to "file", losing the old "file".
It also creates the following data queue:
1. In the file object created in metadata operation 1, store the data "content".
The data operation queue can't be processed until metadata operation 1 has created the file object, as it depends on the results of that operation.
Without the sync, these three operations give you three possible sequences to disk:
1. Metadata operation 1 completed
2. Data operation 1 completed
3. Metadata operation 2 completed.
1. Data operation 1 completed
2. Metadata operation 1 completed
3. Metadata operation 2 completed.
1. Metadata operation 1 completed
2. Metadata operation 2 completed
3. Data operation 1 completed.
If the crash happens just before the final operation completes, the first sequence results in file.tmp containing content, and file being unaltered. The second results in file.tmp being present, with data in it, and file being unaltered, while the third results in file being empty, and the data lost. With the sync, the third sequence is illegal; metadata operation 2 isn't even queued for btrfs to think about until metadata operation 1 and data operation 1 have completed. With the fsync, metadata operation 2 won't be queued until data operation 1 has completed, so again, the third option is impossible. -- Farnz
Thank you for the explanation. The reason I asked is that I was concerned about what would happen if my computer crashed when I was renaming a file that already exists on a btrfs formatted partition (for example, photos that I've had for a while). I found this page while searching for the answer, and my initial thought was that renaming existing files on a btrfs partition would put them in mortal danger. I just wanted to make sure that this was not the case, and also to point out that some people might get the wrong impression from reading this page. Btrfs looks very promising, and I would hate for anyone to avoid using it due to a misconception. --Flickmontana 12:13, 15 January 2011 (UTC)

Question on the encryption Q/A

From the article page:

This pretty much forbids you to use btrfs' cool RAID features if you need encryption. Using a RAID implementation on top of several encrypted disks if much slower than using encryption on top of a RAID device. So the RAID implementation must be on a lower layer than the encryption, which is not possible using btrfs' RAID support.

I wonder if this is true, and if so, how true, especially with hardware AES support. In fact, I think have seen a speedup (with ext4) from such a setup (striped LVM over dm-crypt where the physical volumes were on encrypted partitions) on a computer without hardware AES support. This is due to the fact that the dm-crypt implementation in Linux only uses one crypto thread per encrypted device, so single-threaded performance can be a bottleneck. However such a setup is definitely more of a pain to maintain than a single unencrypted volume group containing encrypted logical volumes. --SLi 21:14, 30 May 2012 (UTC)

Personal tools