From btrfs Wiki
Revision as of 19:12, 28 July 2012 by HugoMills (Talk | contribs)

Jump to: navigation, search


Important Questions

I have a problem with my btrfs filesystem!

See the Problem FAQ for commonly-encountered problems and solutions.

If that page doesn't help you, try asking on IRC or the Btrfs mailing list.

Explicitly said: please report bugs and issues to the mailing list (you are not required to subscribe). (Optionally you can use the bugzilla on kernel.org. Never use the bugzilla on "original" Btrfs project page at Oracle.)

I see a warning in dmesg about barriers being disabled when mounting my filesystem. What does that mean?

Your hard drive has been detected as not supporting barriers. This is a severe condition, which can result in full file-system corruption, not just losing or corrupting data that was being written at the time of the power cut or crash. There is only one certain way to work around this:

Note: Disable the write cache on each drive by running hdparm -W0 /dev/sda against each drive on every boot.

Failure to perform this can result in massive and possibly irrecoverable corruption (especially in the case of encrypted filesystems).

Help! I ran out of disk space!

Help! Btrfs claims I'm out of space, but it looks like I should have lots left!

Free space is a tricky concept in Btrfs. This is especially apparent when running low on it. Read "Why is there so many ways to check the amount of free space" below for the blow-by-blow.

if you're on 2.6.32 or older

You should upgrade your kernel, right now. The error behaviour of Btrfs has significantly improved, such that you get a nice proper ENOSPC instead of an OOPS or worse. There may be backports of Btrfs eventually, but it currently relies on infrastructure and patches outside of the fs tree which make a backport trickier to manage without compromising the stability of your stable kernel.

if your device is small

i.e., a 4GiB flash card: your main problem is the large block allocation size, which doesn't allow for much breathing room. A btrfs fi balance may get you working again, but it's probably only a short term fix, as the metadata to data ratio probably won't match the block allocations.

If you can afford to delete files, you can clobber a file via echo > /path/to/file, which will recover that space without requiring a new metadata allocation (which would otherwise ENOSPC again).

You might consider remounting with -o compress, and either rewrite particular files in-place, or run btrfs fi defragment to recompress everything. This may take a while.

Next, depending on whether your metadata block group or the data block group is filled up, you can recreate your filesystem and mount it with metadata_ratio=, setting the value up or down from the default of 8 (i.e., 4 if metadata ran out first, 12 if data ran out first). This can be changed at any time by remounting, but will only affect new block allocations.

Finally, the best solution is to upgrade to at least 2.6.37 (or the latest stable kernel) and recreate the filesystem to take advantage of mixed block groups, which avoid effectively-fixed allocation sizes on small devices. Note that this incurs a fragmentation overhead, and currently cannot be converted back to normal split metadata/data groups without recreating the partition. Using mixed block groups is recommended for filesystems of 1GiB or smaller and mkfs.btrfs will force mixed block groups automatically in that case.

if your device is large (>16GiB)

sudo btrfs fi show /dev/device should show no free space on any drive.

It may show unallocated space if you're using raid1 with two drives of different sizes, and possibly similar with larger drives. This is normal in itself, as Btrfs will not write both copies to the same device, but you still have an ENOSPC condition.

btrfs fi df /mountpoint will probably report available space in both metadata and data. The problem here is that one particular 256MiB or 1GiB block is full, and wants to allocate another whole block. The easy fix is to run btrfs fi balance /mountpoint. This will take a while (although the system is otherwise usable during this time), but when completed, you should be able to use most of the remaining space. We know this isn't ideal, and there are plans to improve the behavior. Running close to empty is rarely the ideal case, but we can get far closer to full than we do.

In a more time-critical situation, you can reclaim space by clobbering a file via true > /path/to/file. This will delete the contents, allowing the space to be reclaimed, but without requiring a metadata allocation. Get out of the tight spot, and then balance as above.

If the echo does not work, mount with the 'nodatacow' option, and try again (tried with 3.2.20 kernel for Ubuntu Precise). The reason behind that is that in some case the file is already snapshotted in a no obvious way (like a file of a converted ext4 filesystem). Using 'nodatacow' you are sure to not allocate new metadata when the file is overwritten.

Significant improvements in the way that btrfs handles ENOSPC are incorporated in most new kernel releases, so you should also upgrade to the latest kernel if you are not already using it.

Performance vs Correctness

Does Btrfs have data=ordered mode like Ext3?

In v0.16, Btrfs waits until data extents are on disk before updating metadata. This ensures that stale data isn't exposed after a crash, and that file data is consistent with the checksums stored in the btree after a crash.

Note that you may get zero-length files after a crash, see the next questions for more info.

Btrfs does not force all dirty data to disk on every fsync or O_SYNC operation, fsync is designed to be fast.

What are the crash guarantees of overwrite-by-rename?

Overwriting an existing file using a rename is atomic. That means that either the old content of the file is there or the new content. A sequence like this:

echo "oldcontent" > file

# make sure oldcontent is on disk

echo "newcontent" > file.tmp
mv -f file.tmp file

# *crash*

Will give either

  1. file contains "newcontent"; file.tmp does not exist
  2. file contains "oldcontent"; file.tmp may contain "newcontent", be zero-length or not exists at all.

What are the crash guarantees of rename?

Renames NOT overwriting existing files do not give additional guarantees. This means, a sequence like

echo "content" > file.tmp
mv file.tmp file

# *crash*

will most likely give you a zero-length "file". The sequence can give you either

  1. Neither file nor file.tmp exists
  2. Either file.tmp or file exists and is 0-size or contains "content"

For more info see this thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/5599/focus=5623

Can the data=ordered mode be turned off in Btrfs?

No, it is an important part of keeping data and checksums consistent. The Btrfs data=ordered mode is very fast and turning it off is not required for good performance.

What checksum function does Btrfs use?

Currently Btrfs uses crc32c for data and metadata. The disk format has room for 256bits of checksum for metadata and up to a full leaf block (roughly 4k or more) for data blocks. Over time we'll add support for more checksum alternatives.

Can data checksumming be turned off?

Yes, you can disable it by mounting with -o nodatasum

Can copy-on-write be turned off for data blocks?

Yes, you can disable it by mounting with -o nodatacow. This implies -o nodatasum as well. COW may still happen if a snapshot is taken.


(See also the Project ideas page)

When will Btrfs have a fsck like tool?

It does!

The first detailed report on what comprises "btrfsck"

The btrfsck tool in the git master branch for btrfs-progs is now capable of repairing some types of filesystem breakage. It is not well-tested in real-life situations yet. If you have a broken filesystem, it is probably better to use btrfsck with advice from one of the btrfs developers, just in case something goes wrong. (But even if it does go badly wrong, you've still got your backups, right?)

Note that there is also a recovery tool in the btrfs-progs git repository which can often be used to copy essential files out of broken filesystems.

Can I use RAID[56] on my Btrfs filesystem?

(2012-07-28) Not yet. Patches are being integrated, and are expected to arrive in 3.7.

RAID-5 was due to arrive in 3.5, but didn't make it in time because of a serious bug. The feature also missed 3.6, because two other large and important features also had to go in, and there wasn't time to complete the full testing programme for all three features before the 3.6 merge window.

Is Btrfs optimized for SSD?

There are some optimizations for SSD drives, and you can enable them by mounting with -o ssd. As of 2.6.31-rc1, this mount option will be enabled if Btrfs is able to detect non-rotating storage. SSD is going to be a big part of future storage, and the Btrfs developers plan on tuning for it heavily. Note that -o ssd will not enable TRIM/discard.

Does Btrfs support TRIM/discard?

"-o discard" is supported, but can have some negative consequences on performance on some SSDs or at least whether it adds worthwhile performance is up for debate depending on who you ask, and makes undeletion/recovery near impossible while being a security problem if you use dm-crypt underneath (see http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html ), therefore it is not enabled by default. You are welcome to run your own benchmarks and post them here, with the caveat that they'll be very SSD firmware specific.

Does btrfs support encryption?

No. This is a hard task and easy to do wrong (on a filesystem level). There's nobody working on it, although you may have heard it's planned (year 2009), try to understand it like not impossible. Instead, you should use available whole-disk encryption solutions such as dm-crypt or LUKS.

This pretty much forbids you to use btrfs' cool RAID features if you need encryption. Using a RAID implementation on top of several encrypted disks is much slower than using encryption on top of a RAID device. So the RAID implementation must be on a lower layer than the encryption, which is not possible using btrfs' RAID support.

Note: there is an option to use a btrfs internal raid1 for data and metadata: create the filesystem with --mixed option (and DUP profiles).

Does Btrfs work on top of dm-crypt?

This is deemed safe since 3.2 kernels (corruption has been reported before that, so you want a recent kernel).

Does btrfs support deduplication?

Not yet. A preliminary patch was sent to the mailing list in early 2011, for discussion. Nobody has yet picked up on the work to develop it further.

Does btrfs support swap files?

No. Just making a file NOCOW does not help, swap file support relies on one function that btrfs intentionally does not implement due to potential corruptions. The swap implementation relies on some assumptions which may not hold in btrfs, like block numbers in the swap file while btrfs has a different block number mapping in case of multiple devices. There is a patchset (swap-over-nfs) which enhances the swapfile API and btrfs could use it, but the patchset is not merged and we don't know when will be, if ever.


Does grub support btrfs?

Yes. GRUB 2.00 supports many btrfs configurations (including zlib and lzo compression, and RAID0/1/10 multi-dev filesystems). If your distribution only provides older versions of GRUB, you'll have to build it for yourself.


Common questions

How do I do...?

See also the UseCases page.

Is btrfs stable?

Short answer: No, it's still considered experimental.

Long answer: Nobody is going to magically stick a label on the btrfs code and say "yes, this is now stable and bug-free". Different people have different concepts of stability: a home user who wants to keep their ripped CDs on it will have a different requirement for stability than a large financial institution running their trading system on it. If you are concerned about stability in commercial production use, you should test btrfs on a testbed system under production workloads to see if it will do what you want of it. In any case, you should join the mailing list (and hang out in IRC) and read through problem reports and follow them to their conclusion to give yourself a good idea of the types of issues that come up, and the degree to which they can be dealt with. Whatever you do, we recommend keeping good, tested, off-system (and off-site) backups.

Pragmatic, personal and anecdotal answer: (HugoMills, 2011-08-21) In the last few months, the vast majority of the problems with broken and unmountable filesystems I've seen on IRC and the mailing list have been caused by power outages in the middle of a write to the FS, and have been fixable by use of the btrfs-zero-log tool (update, 2011-11-10: that particular bug has been fixed). We also have more filesystem-fixing tools coming along soon, which may make you happier about stability.

I have converted my ext4 partition into Btrfs, how do I delete the ext2_saved folder?


btrfs subvolume delete /path/to/subvolume

with btrfs-progs from Git.

Why does df show incorrect free space for my RAID volume?

Aaargh! My filesystem is full, and I've put almost nothing into it!

Why are there so many ways to check the amount of free space?

Because there's so many ways to answer the question.

Free space in Btrfs is a tricky concept, owing partly to the features it provides, and owing partly to the difficulty in sorting out what exactly you want to know at the moment you ask. Eventually somebody will figure out a sane solution that doesn't grossly misrepresent the situation depending on the phase of the moon, but until then...

To understand the different ways that btrfs's tools report filesystem usage and free space, you need to know how it allocates and uses space.

Btrfs starts with a pool of raw storage. This is what you see when you run btrfs fi show:

$ sudo btrfs fi show /dev/sda1
Label: none  uuid: 12345678-1234-5678-1234-1234567890ab
	Total devices 2 FS bytes used 304.48GB
	devid    1 size 427.24GB used 197.01GB path /dev/sda1
	devid    2 size 465.76GB used 197.01GB path /dev/sdc1

This shows the total number of bytes available and allocated on each disk. This is the "raw" data used: the bytes written to the various devices that contain real information, whether containing redundant data or not.

As the filesystem needs space for data or metadata, it allocates chunks of this pool of raw storage. This allocation of data is known as a block group. The way that it does this depends on the RAID level, and the type of information it is attempting to store. For "single" replication, a block group will consist of a single lump of data on a single device. For RAID-1, it will be two equal chunks of data on two different devices, each containing the same information. For RAID-0, it will be as many blocks of data as there are devices with free space on. The unit of allocation per device is fixed: 1 GiB per device for a Data allocation, and 256 MiB per device for a Metadata allocation. (So a RAID-0 Data block group contains n GiB of actual data versus n GiB of raw storage; and a RAID-1 block group contains 1 GiB of actual data, and 2 GiB of raw storage; when we get it, a RAID-6 block group will use n+2 GiB of raw storage for each n GiB of data stored).

So, as an example, as you start writing to an empty filesystem with RAID-1 data, it will allocate two chunks of 1GiB each, which between them have 1GiB of storage capacity. You will see 2GiB of raw space used in "btrfs fi show", 1GiB from each of two devices. You will also see 1GiB of free space appear in "btrfs fi df" as "Data, RAID1". As you write files to it, that 1GiB will get used up at the rate you'd expect (i.e., write 1MiB to it, and 1MiB gets used -- in "btrfs fi df" output). When that 1GiB is used up, another 1GiB is allocated and used.

The total space allocated from the raw pool is shown with btrfs fi show. If you want to see the types and quantities of space allocated, and what they can store, the command is btrfs fi df <mountpoint>:

$ btrfs fi df /
Metadata: total=18.00GB, used=6.10GB
Data: total=358.00GB, used=298.37GB
System: total=12.00MB, used=40.00KB

This shows how much data has been allocated for each data type and replication type, and how much has been used. The values shown are data rather than raw bytes, so if you're using RAID-1 or RAID-10, the amount of raw storage used is double the values you can see here.

Why is free space so complicated?

You might think, "My whole disk is RAID-1, so why can't you just divide everything by 2 and give me a sensible value in df?".

If everything is RAID-1 (or RAID-0, or in general all the same RAID level), then yes, we could give a sane and consistent value from df. However, we have plans to allow per-subvolume and per-file RAID levels. In this case, it becomes impossible to give a sensible estimate as to how much space there is left.

For example, if you have one subvolume as "single", and one as RAID-1, then the first subvolume will consume raw storage at the rate of one byte for each byte of data written. The second subvolume will take two bytes of raw data for each byte of data written. So, if we have 30GiB of raw space available, we could store 30GiB of data on the first subvolume, or 15GiB of data on the second, and there is no way of knowing which it will be until the user writes that data.

So, in general, it is impossible to give an accurate estimate of the amount of free space on any btrfs filesystem. Yes, this sucks. If you have a really good idea for how to make it simple for users to understand how much space they've got left, please do let us know, but also please be aware that the finest minds in btrfs development have been thinking about this problem for at least a couple of years, and we haven't found a simple solution yet.

Why is there so much space overhead?

There are several things meant by this. One is the out-of-space issues discussed above; this is a known deficiency, which can be worked around, and will eventually be worked around properly. The other meaning is the size of the metadata block group, compared to the data block group. Note that you should compare the size of the allocations, but rather the used space in the allocations.

There are several considerations:

  • The default raid level for the metadata group is dup on single drive systems, and raid1 on multi drive systems. The meaning is the same in both cases: there's two copies of everything in that group. This can be disabled at mkfs time, and it will eventually be possible to migrate raid levels online.
  • There an overhead to maintaining the checksums (approximately 0.1% – 4 bytes for each 4k block)
  • Small files are also written inline into the metadata group. If you have several gigabytes of very small files, this will add up.

[incomplete; disabling features, etc]

How much space do I get with unequal devices in RAID-1 mode?

If your largest device is bigger than all of the others put together, then you will get as much space as all the smaller devicess added together. Otherwise, you get half of the space of all of your devices added together.

For example, if you have disks of size 3TB, 1TB, 1TB, your largest disk is 3TB and the sum of the rest is 2TB. In this case, your largest disk is bigger than the sum of the rest, and you will get 2TB of usable space.

If you have disks of size 3TB, 2TB, 2TB, then your largest disk is 3TB and the sum of the rest of 4TB. In this case, your largest disk is smaller than the sum of the rest, and you will get (3+2+2)/2 = 3.5TB of usable space.

What does "balance" do?

btrfs filesystem balance is an operation which simply takes all of the data and metadata on the filesystem, and re-writes it in a different place on the disks, passing it through the allocator algorithm on the way. It was originally designed for multi-device filesystems, to spread data more evenly across the devices (i.e. to "balance" their usage). This is particularly useful when adding new devices to a nearly-full filesystem.

Due to the way that balance works, it also has some useful side-effects:

  • If there is a lot of allocated but unused data or metadata chunks, a balance may reclaim some of that allocated space. This is the main reason for running a balance on a single-device filesystem.
  • On a filesystem with damaged replication (e.g. a RAID-1 FS with a dead and removed disk), it will force the FS to rebuild the missing copy of the data on one of the currently active devices, restoring the RAID-1 capability of the filesystem.

Does a balance operation make the internal B-trees better/faster?

No, balance has nothing at all to do with the B-trees used for storing all of btrfs's metadata. The B-tree implementation used in btrfs is effectively self-balancing, and won't lead to imbalanced trees. See the question above for what balance does (and why it's called "balance").

Do I need to run a balance regularly?

In general usage, no. A full unfiltered balance typically takes a long time, and will rewrite huge amounts of data unnecessarily. You may wish to run a balance on metadata only (see Balance_Filters) if you find you have very large amounts of metadata space allocated but unused, but this should be a last resort. At some point, this kind of clean-up will be made an automatic background process.

What is a subvolume?

A subvolume is like a directory - it has a name, there's nothing on it when it is created, and it can hold files and other directories. There's at least one subvolume in every Btrfs filesystem, the top-level subvolume.

The equivalent in Ext4 would be a filesystem. Each subvolume behaves as a individual filesystem. The difference is that in Ext4 you create each filesystem in a partition, in Btrfs however all the storage is in the 'pool', and subvolumes are created from the pool, you don't need to partition anything. You can create as many subvolumes as you want, as long as you have storage capacity.

How do I find out which subvolume is mounted?

A specific subvolume can be mounted by -o subvol=/path/to/subvol option, but currently it's not implemented to read that path directly from /proc/mounts. If the filesystem is mounted via a /etc/fstab entry, then output of mount command will show the subvol path, as it reads it from /etc/mtab.

Generally working way to read the path, like for bind mounts, is from /proc/self/mountinfo

27 21 0:19 /subv1 /mnt/ker rw,relatime - btrfs /dev/loop0 rw,space_cache

What is a snapshot?

  • A snapshot is a frozen image of all the files and directories of a subvolume. For example, if you have two files ("a" and "b") in a subvolume, you take a snapshot and you delete "b", the file you just deleted is still available in the snapshot you took. The great thing about Btrfs snapshots is you can operate on any files or directories vs lvm when it is the whole logical volume.

snapshot example

  • Since backup from tape are a pain here is the thoughts of a lazy sysadm that create a home directory as a Btrfs file system for their users, lets try some fancy net attached storage ideas.
    • /home
      • Then there could be a snaphot every 6 hours via cron
        • /home_today_00,/home_today_06,/home_today_12,/home_today_18,
The logic would look something like this for rolling 3 day rotation that would use cron @ midnight
  • rename /home_today_00, /home_backday_1
  • create a symbolic link for /home_backDay_00 that points to real dir of /home_backday_1
  • rename /home_today_06, /home_backDay_06 , Need to do this for all hours (06..18)
  • /home_backday_1,/home_backday_2,/home_backday_3
    • delete the /home_backday_3
    • rename /home_backday_2 to /home_backday_3 day
    • rename /home_backday_1 to /home_backday_2 day

What is the difference between mount -o ssd and mount -o ssd_spread?

Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. Mount -o ssd_spread is often faster on the less expensive SSD devices. The default for autodetected SSD devices is mount -o ssd.

Will Btrfs be in the mainline Linux Kernel?

Btrfs is already in the mainline Linux kernel. It was merged on 9th January 2009, and was available in the Linux 2.6.29 release.

Does Btrfs run with older kernels?

v0.16 of Btrfs maintains compatibility with kernels back to 2.6.18. Kernels older than that will not work.

The current Btrfs unstable repositories only work against the mainline kernel. Once Btrfs is in mainline a backport repository will be created again.

How long will the Btrfs disk format keep changing?

The Btrfs disk format is not finalized, but it won't change unless a critical bug is found and no workarounds are possible. Not all the features have been implemented, but the current format is extensible enough to add those features later without requiring users to reformat.

How do I upgrade to the 2.6.31 format?

The 2.6.31 kernel can read and write Btrfs filesystems created by older kernels, but it writes a slightly different format for the extent allocation trees. Once you have mounted with 2.6.31, the stock Btrfs in 2.6.30 and older kernels will not be able to mount your filesystem.

We don't want to force people into 2.6.31 only, and so the newformat code is available against 2.6.30 as well. All fixes will also be maintained against 2.6.30. For details on downloading, see the Btrfs source repositories.

Can I find out compression ratio of a file?

Currently no. There's a patch https://patchwork.kernel.org/patch/117782/ adding the kernel part (ioctl). However, the size obtained by this ioctl is not exact and is rounded up to block size (4KB). The real amount of compressed bytes is not reported and recorded by the filesystem (only the block count) in it's structures. It is saved in the disk blocks but solely processed by the compression code.

I'm running btrfs 0.19...

This is, unfortunately, almost meaningless. Almost all of the "interesting" code in btrfs is in the kernel, so the main thing you should be reporting is the version of the kernel you're running.

Even if you want to report a problem with the btrfs userspace tools, the main version number (which is usually 0.19) is useless, because it hasn't been updated in at least 18 months. If you have installed from your distribution's package manager, then the version number of the package will usually include a date that will indicate when your btrfs tools were compiled; it is this package version that you should tell people about if you have a problem. If you have built your btrfs-progs tools from git, please tell us what git commit ID was the head when you built your tools. A recent version of the btrfs-progs tools should report the commit ID as part of the version number when you run them:

hrm@ruthven:~ $ btrfs --help
Btrfs v0.19-116-g13eced9
                ^^^^^^^^ this is the git commit ID

Can I mount subvolumes with different mount options?


  • nodev, nosuid and probably all the generic ones
  • subvol=


  • compress/compress-force — possible, but not implemented
  • ro — via bind mount

No with the rest like space_cache, inode_cache, discard, autodefrag, ...

About the project

Does the Btrfs multi-device support make it a "rampant layering violation"?

Yes and no. Device management is a complex subject, and there are many different opinions about the best way to do it. Internally, the Btrfs code separates out components that deal with device management and maintains its own layers for them. The vast majority of filesystem metadata has no idea there are multiple devices involved.

Many advanced features such as checking alternate mirrors for good copies of a corrupted block are meant to be used with RAID implementations below the FS.

What is CRFS? Is it related to Btrfs?

[CRFS] is a network file system protocol. It was designed at around the same time as Btrfs. It's wire format uses some Btrfs disk formats and crfsd, a CRFS server implementation, uses Btrfs to store data on disk. More information can be found at http://oss.oracle.com/projects/crfs/ and http://en.wikipedia.org/wiki/CRFS

Will Btrfs become a clustered file system

No. Btrfs's main goal right now is to be the best non-cluster file system.

If one wants a cluster file system there are many production choices that can be found Distributed file systems section on wikipedia, keep in mind that each file system has their own +s or -s so find the best fit for your environment. Most have a set cluster maximum and would that work in your environment is the question that one has to answer.

The closest cluster file system that uses Btrfs as it's underlining file system is Ceph

Personal tools