I see a warning in dmesg about barriers being disabled when mounting my filesystem. What does that mean?
Your hard drive has been detected as not supporting barriers. This is a severe condition, which can result in full file-system corruption, not just losing or corrupting data that was being written at the time of the power cut or crash. There is only one certain way to work around this:
Failure to perform this can result in massive and possibly irrecoverable corruption (especially in the case of encrypted filesystems).
Help! I ran out of disk space!
Help! Btrfs claims I'm out of space, but it looks like I should have lots left!
Free space is a tricky concept in Btrfs. This is especially apparent when running low on it. Read "Why is there so many ways to check the amount of free space" below for the blow-by-blow.
if you're on 2.6.32 or older
You should consider upgrading. The error behaviour of Btrfs has significantly improved, such that you get a nice proper ENOSPC instead of an OOPS or worse. There may be backports of Btrfs eventually, but it currently relies on infrastructure and patches outside of the fs tree which make a backport trickier to manage without compromising the stability of your stable kernel.
if your device is small
i.e., a 4gb flash card: your main problem is the large block allocation size, which doesn't allow for much breathing room. a
btrfs fi balance may get you working again, but it's probably only a short term fix, as the metadata to data ratio probably won't match the block allocations.
If you can afford to delete files, you can clobber a file via
echo > /path/to/file, which will recover that space without requiring a new metadata allocation (which would otherwise ENOSPC again).
You might consider remounting with
-o compress, and either rewrite particular files in-place, or run
btrfs fi defragment to recompress everything. This may take a while.
Next, depending on whether your metadata block group or the data block group filled up, you can recreate your filesystem and mount it with
metadata_ratio=, setting the value up or down from the default of 8 (i.e., 4 if metadata ran out first, 12 if data ran out first). This can be changed at any time by remounting, but will only affect new block allocations.
Finally, the best solution is to upgrade to 2.6.37 and recreate the filesystem to take advantage of mixed block groups, which avoid effectively-fixed allocation sizes on small devices. Note that this incurs a fragmentation overhead, and currently cannot be converted back to normal split metadata/data groups without recreating the partition. Using mixed block groups is currently (kernel 2.6.37) only recommended for filesystems of 1GiB or smaller.
if your device is large (>16gb)
sudo btrfs fi show /dev/device should show no free space on any drive.
It may show unallocated space if you're using raid1 with two drives of different sizes, and possibly similar with larger drives). This is normal in itself, as Btrfs will not write both copies to the same device, but you still have an enospc condition.
btrfs fi df /mountpoint will probably report available space in both metadata and data. The problem here is that one particular 256MB or 1GB block is full, and wants to allocate another whole block. The easy fix is to run
btrfs fi balance /mountpoint. This will take a while (although the system is otherwise usable during this time), but when completed, you should be able to use most of the remaining space. We know this isn't ideal, and there are plans to improve the behavior. Running close to empty is rarely the ideal case, but we can get far closer to full than we do.
In a more time-critical situation, you can reclaim space by clobbering a file via
echo > /path/to/file. This will delete the contents, allowing the space to be reclaimed, but without requiring a metadata allocation.
Get out of the tight spot, and then balance as above.
Performance vs Correctness
Does Btrfs have data=ordered mode like Ext3?
In v0.16, Btrfs waits until data extents are on disk before updating metadata. This ensures that stale data isn't exposed after a crash, and that file data is consistent with the checksums stored in the btree after a crash.
Note that you may get zero-length files after a crash, see the next questions for more info.
Btrfs does not force all dirty data to disk on every fsync or O_SYNC operation, fsync is designed to be fast.
What are the crash guarantees of overwrite-by-rename?
Overwriting an existing file using a rename is atomic. That means that either the old content of the file is there or the new content. A sequence like this:
echo "oldcontent" > file # make sure oldcontent is on disk sync echo "newcontent" > file.tmp mv -f file.tmp file # *crash*Will give either
- file contains "newcontent"; file.tmp does not exist
- file contains "oldcontent"; file.tmp may contain "newcontent", be zero-length or not exists at all.
What are the crash guarantees of rename?
Renames NOT overwriting existing files do not give additional guarantees. This means, a sequence like
echo "content" > file.tmp mv file.tmp file # *crash*will most likely give you a zero-length "file". The sequence can give you either
- Neither file nor file.tmp exists
- Either file.tmp or file exists and is 0-size or contains "content"
For more info see this thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/5599/focus=5623
Can the data=ordered mode be turned off in Btrfs?
No, it is an important part of keeping data and checksums consistent. The Btrfs data=ordered mode is very fast and turning it off is not required for good performance.
What checksum function does Btrfs use?
Currently Btrfs uses crc32c for data and metadata. The disk format has room for 256bits of checksum for metadata and up to a full leaf block (roughly 4k or more) for data blocks. Over time we'll add support for more checksum alternatives.
Can data checksumming be turned off?
Yes, you can disable it by mounting with -o nodatasum
Can copy-on-write be turned off for data blocks?
Yes, you can disable it by mounting with -o nodatacow. This implies -o nodatasum as well. COW may still happen if a snapshot is taken.
How do I do...?
See also the UseCases page.
I have converted my ext4 partition into Btrfs, how do I delete the ext2_saved folder?
Use "btrfs subvolume delete" or "btrfsctl -D", with btrfs-progs from Git.
When will Btrfs have a fsck like tool
Check back soon.
(2011-01-12) A "scanning fsck" (which will be able to do things like find and fix missing superblocks) is "finally almost ready", according to cmason on IRC.
df show incorrect free space for my RAID volume?
Why are there so many ways to check the amount of free space?
Because there's so many ways to answer the question.
Free space in Btrfs is a tricky concept, owing partly to the features it provides, and owing partly to the difficulty in sorting out what exactly you want to know at the moment you ask. Eventually somebody will figure out a sane solution that doesn't grossly misrepresent the situation depending on the phase of the moon, but until then...
Currently, the system's
df command will suffice for purposes that don't require precision. It's almost completely sufficient, as long as you're aware of the raid level in use, so that you can multiply or divide the numbers accordingly. (This will become oh so much fun once we support different raid levels per-subvolume). Anything more precise needs some background on how Btrfs manages space.
Space in Btrfs exists in a pool of the sum total of all the drives/partitions included in the fs. From this pool, large blocks are allocated to the metadata block group (in 256MB allocations, or the remainder) and the data block group (in ~1GB allocations, or the remainder) as necessary. When a file is written, space from two metadata groups (by default) and one data group is required (or more, depending straightforwardly on the raid level). So, any given block has an amount of free space not currently allocated to files or metadata, and in addition the pool itself has free space that hasn't been allocated to a block group. Nearly all of the variation between the tools comes from these distinctions, and each tool has its quirks.
du (os builtin)
Included for completeness,
du is very roughly equivalent to the "used" size reported by the other tools. Specifically, it will report the space that would be required if the folder was written to an uncompressed tarball. It will not reflect metadata or compression, and will double-count snapshots (causing the total to potentially be massively high). As such, there is no straightforward way to take its total and convert it to the "used" bytes reported by the other tools.
df (os builtin)
user@machine:~$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/sda1 894G 311G 583G 35% /
df will return:
- The device name used to mount the filesystem
- The sum total of space available across drives in the filesystem (currently the only place to directly read this)
- The total bytes used to store files (including metadata).
- The total free bytes available to this mountpoint, which includes unused bytes within the relevant block groups as well as bytes that haven't yet been allocated to a block group.
btrfs filesystem df /mountpoint
user@machine:~$ btrfs fi df / Metadata: total=18.00GB, used=6.10GB Data: total=358.00GB, used=298.37GB System: total=12.00MB, used=40.00KB
Btrfs' native df, will display:
- The total number of bytes storable in metadata block group, taking raid level into account
- The amount of that space which is in use
- The raid level (2.6.37 or later) of that group
- The total number of bytes storable in data block group, taking raid level into account
- The amount of that space which is in use
- The raid level (2.6.37 or later) of that group
Notably absent is the total amount of space available and total in the pool; we should probably add this.
btrfs filesystem show /dev/deviceName
sudo btrfs fi show /dev/sda1 #root required! Label: none uuid: 12345678-1234-5678-1234-1234567890ab Total devices 2 FS bytes used 304.48GB devid 1 size 427.24GB used 197.01GB path /dev/sda1 devid 2 size 465.76GB used 197.01GB path /dev/sdc1
Btrfs' device summary, will display:
- uuid of the filesystem
- Total number of devices
- Total bytes used
- A list of each device including
- Device size
- Bytes used on that device
- Path to the device
Notably absent again is the total amount of space available and total in the pool, although those can be calculated in this case.
Also note that root is required.
Two relations to note
user@machine:~$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/sda1 894G 311G 583G 35% / ^^^^ user@machine:~$ btrfs fi df / Metadata: total=18.00GB, >>used=6.10GB<< *2= 12.20GB Data: total=358.00GB, >>used=298.37GB<< *1= 298.37GB System: total=12.00MB, >>used=40.00KB<< *1= 0.00GB == 310.57GB ~~ 311 GB
user@machine:~$ sudo btrfs fi show /dev/sda1 Label: none uuid: 12345678-1234-5678-1234-1234567890ab Total devices 2 FS bytes used 304.48GB devid 1 size 427.24GB used >>197.01GB<< path /dev/sda1 devid 2 size 465.76GB used >>197.01GB<< path /dev/sdc1 ==394.02GB user@machine:~$ btrfs fi df / Metadata: >>total=18.00GB<<, used=6.10GB *2= 36.00GB Data: >>total=358.00GB<<, used=298.37GB *1= 358.00GB System: >>total=12.00MB<<, used=40.00KB *2= 0.02GB == 394.02GB
Yes, this is more complicated than it needs to be. Yes, we'll fix it, as soon as we've sorted out how other features play into this.
Why is there so much space overhead?
There are several things meant by this. One is the out-of-space issues discussed above; this is a known deficiency, which can be worked around, and will eventually be worked around properly. The other meaning is the size of the metadata block group, compared to the data block group. Note that you should compare the size of the allocations, but rather the used space in the allocations.
There are several considerations:
- The default raid level for the metadata group is
dupon single drive systems, and
raid1on multi drive systems. The meaning is the same in both cases: there's two copies of everything in that group. This can be disabled at mkfs time, and it will eventually be possible to migrate raid levels online.
- There an overhead to maintaining the checksums [XXX: percentage?]
- Small files are also written inline into the metadata group. If you have several gb of very small files, this will add up.
[incomplete; disabling features, etc]
What is a snapshot?
- A snapshot is a frozen image of all the files and directories. For example, if you have two files ("a" and "b"), you take a snapshot and you delete "b", the file you just deleted is still available in the snapshot you took. The great thing about Btrfs snapshots is you can operate on any files or directories vs lvm when it is the whole logical volume.
- Since backup from tape are a pain here is the thoughts of a lazy sysadm that create a home directory as a Btrfs file system for their users, lets try some fancy net attached storage ideas.
- Then there could be a snaphot every 6 hours via cron
- Then there could be a snaphot every 6 hours via cron
The logic would look something like this for rolling 3 day rotation that would use cron @ midnight
- rename /home_today_00, /home_backday_1
- create a symbolic link for /home_backDay_00 that points to real dir of /home_backday_1
- rename /home_today_06, /home_backDay_06 , Need to do this for all hours (06..18)
- delete the /home_backday_3
- rename /home_backday_2 to /home_backday_3 day
- rename /home_backday_1 to /home_backday_2 day
What is a subvolume?
A subvolume is like a directory - it has a name, there's nothing on it when it is created, and it can hold files and other directories. There's at least one subvolume in every Btrfs filesystem, the "root" subvolume.
The equivalent in Ext4 would be a filesystem. Each subvolume behaves as a individual filesystem. The difference is that in Ext4 you create each filesystem in a partition, in Btrfs however all the storage is in the 'pool', and subvolumes are created from the pool, you don't need to partition anything. You can create as many subvolumes as you want, as long as you have storage capacity.
Resizing partitions (shrink/grow)
Mount File system
mount -t btrfs /dev/xxx /mnt
add 2GB to the FS
btrfs filesystem resize +2G /mnt
btrfsctl -r +2g /mnt
shrink 4GB to the FS
btrfs filesystem resize -4g /mnt
btrfsctl -r -4g /mnt
Explicitly set the FS size
btrfsctl -r 20g /mnt
btrfs filesystem resize 20g /mnt
Use 'max' to grow the FS to the limit of the device
btrfs filesystem resize max /mnt
btrfsctl -r max /mnt
Can I use RAID on my Btrfs filesystem?
Not yet. There are some patches from Dave Woodhouse on the mailing list, but they are unfinished and not yet committed to Git. Rumour has it
2.6.37 2.6.39 will be the magic number.
(The patches didn't make it to the 2.6.38 merge window after 2.6.37, so the earliest they'll make it is 2.6.39, now).
Is Btrfs optimized for SSD?
There are some optimizations for SSD drives, and you can enable them by mounting with -o ssd. As of 2.6.31-rc1, this mount option will be enabled if Btrfs is able to detect non-rotating storage. SSD is going to be a big part of future storage, and the Btrfs developers plan on tuning for it heavily.
What is the difference between mount -o ssd and mount -o ssd_spread?
Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. Mount -o ssd_spread is often faster on the less expensive SSD devices. The default for autodetected SSD devices is mount -o ssd.
Will Btrfs be in the mainline Linux Kernel?
Btrfs is already in the mainline Linux kernel. It was merged on 9th January 2009, and was available in the Linux 2.6.29 release.
Does Btrfs run with older kernels?
v0.16 of Btrfs maintains compatibility with kernels back to 2.6.18. Kernels older than that will not work.
The current Btrfs unstable repositories only work against the mainline kernel. Once Btrfs is in mainline a backport repository will be created again.
How long will the Btrfs disk format keep changing?
The Btrfs disk format is not finalized, but it won't change unless a critical bug is found and no workarounds are possible. Not all the features have been implemented, but the current format is extensible enough to add those features later without requiring users to reformat.
How do I upgrade to the 2.6.31 format?
The 2.6.31 kernel can read and write Btrfs filesystems created by older kernels, but it writes a slightly different format for the extent allocation trees. Once you have mounted with 2.6.31, the stock Btrfs in 2.6.30 and older kernels will not be able to mount your filesystem.
We don't want to force people into 2.6.31 only, and so the newformat code is available against 2.6.30 as well. All fixes will also be maintained against 2.6.30. For details on downloading, see the Btrfs source repositories.
About the project
Does the Btrfs multi-device support make it a "rampant layering violation"?
Yes and no. Device management is a complex subject, and there are many different opinions about the best way to do it. Internally, the Btrfs code separates out components that deal with device management and maintains it's own layers for them. The vast majority of filesystem metadata has no idea there are multiple devices involved.
Many advanced features such as checking alternate mirrors for good copies of a corrupted block are meant to be used with RAID implementations below the FS.
[CRFS] is a network file system protocol. It was designed at around the same time as Btrfs. It's wire format uses some Btrfs disk formats and crfsd, a CRFS server implementation, uses Btrfs to store data on disk. More information can be found at http://oss.oracle.com/projects/crfs/ and http://en.wikipedia.org/wiki/CRFS
Will Btrfs become a clustered file system
No. Btrfs' main goal right now is to be the best non-cluster file system.
If one wants a cluster file system there are many production choices that can be found Distributed file systems section on wikipedia, keep in mind that each file system has their own +s or -s so find the best fit for your environment. Most have a set cluster maximum and would that work in your environment is the question that one has to answer.
The closest cluster file system that uses Btrfs as it's underlining file system is Ceph