Compression

From btrfs Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated.

Btrfs supports transparent file compression. There are three algorithms available: ZLIB, LZO and ZSTD(since v4.14). Basically, compression is on a file by file basis. You can have a single btrfs mount point that has some files that are uncompressed, some that are compressed with LZO, some with ZLIB, for instance (though you may not want it that way, it is supported).

Contents

How do I enable compression?

Mount with -o compress or -o compress-force. Then write (or re-write) files, and they will be transparently compressed. Some files may not compress very well, and these are typically not recompressed but still written uncompressed. See the What happens to incompressible files? section, below.

What is the default compression method?

ZLIB. The 'default' means if it's specified by the mount option 'compress' or 'compress-force', or via chattr +c, or btrfs filesystem defrag -c.

Can I set the compression level?

The level support of ZLIB has been added in v4.14, LZO does not support levels (the kernel implementation provides only one), ZSTD level support has been added in v5.1.

There are 9 levels of ZLIB supported (1 to 9), mapping 1:1 from the mount option to the algorithm defined level.

The default is level 3, which provides the highest compression ratio and is still reasonably fast. The difference in compression gain of levels 7, 8 and 9 is comparable, but the higher levels take longer. The level can be specified as the mount option, as "compress=zlib:1".

The ZSTD support includes up to 15 levels.

Integer with larger value is slower, with better compression; lower value is faster, with lesser compression. Level 0 maps to the default.

What are the differences between compression methods?

There's a speed/ratio trade-off:

  • ZLIB -- slower, higher compression ratio (uses zlib level 3 setting, you can see the zlib level difference between 1 and 6 in zlib sources).
  • LZO -- faster compression and decompression than zlib, worse compression ratio, designed to be fast
  • ZSTD -- (since v4.14) compression comparable to zlib with higher compression/decompression speeds and different ratio levels (details)

The differences depend on the actual data set and cannot be expressed by a single number or recommendation. Do your own benchmarks. LZO seems to give satisfying results for general use.

Are there other compression methods supported?

Currently no, and with ZSTD, there are no further plans to add more. The LZ4 algorithm was considered but has not brought significant gains.

re LZ4: patches apparently as old as 2012 were submitted by DSterba when lz4 was claiming "only" 1G/s Phoronix. Current ZSTD homepage recommends LZ4 for usecases benefiting from higher IO than zstd's practical drop-off.

Also of notable significance:

* improvements in more recent code enabling effective 128k+ block contiguous dictionaries 
* recent Asymmetric Numeral Systems developments enabling even  more efficient decompression bias    
* that LZ4 is available in kernel and in at least 2 other vanilla VFS options presently(2021). 

Snappy support (compresses slower than LZ0 but decompresses much faster) has also been proposed.

Some work has been done toward adding lzma (very slow, high compression) support as well. Current status is "not considered anymore".

Can a file data be compressed with different methods?

Yes. The compression algorithm is stored per-extent. Setting the compression method will affect newly written data, so it's possible to have all types of compression in file.

How can I determine compressed size of a file?

compsize takes a list of files on a btrfs filesystem and measures used compression types and effective compression ratio: https://github.com/kilobyte/compsize

There is a patch adding support for that, currently it's not merged. You can kind of guess at its compressed size by comparing the output from the df command before and after writing a file, if this is available to you.

Why does not du report the compressed size?

Traditionally the UNIX/Linux filesystems did not support compression and there was no item in stat data structure allocated for a similar purpose. There's the file size, that denotes nominal file size independent of the actually allocated size on-disk. For that purpose, the stat.st_blocks item contains a value that corresponds to the number of blocks allocated, i.e. in case of sparse files. However, when a compression is involved, the actually allocated size may be smaller than nominal, although the file is not sparse.
There are utilities that determine sparseness of a file by comparing the nominal and block-allocated size, this behaviour might cause bugs if st_blocks contained the amount after compression.
Another issue with backward compatibility is that up to now st_blocks always contains the uncompressed number of blocks. It's unclear what would happen if there are files with mixed types of the value. The proposed solution is to add another special call for that (via ioctl), but this may be not the ideal solution.

Can I determine what compression method was used on a file?

Not directly, but this is possible from a userspace tool without any special kernel support (the code just has not been written).

Can I force compression on a file without using the compress mount option?

Yes. The utility chattr supports setting file attribute c that marks the inode to compress newly written data. Setting the compression property on a file using btrfs property set <file> compression <zlib|lzo|zstd> will force compression to be used on that file using the specified algorithm.

Can I disable compression on a file?

It is possible to disable compression of new extents on a file using the btrfs property set <file> compression none command. This will set the "no compression" flag on the file and newly written extents will not be compressed until the flag is cleared either by chattr +c or by using the compression property to specify an algorithm. The flag can be removed with chattr -c. Already written extents will not be rewritten.

Note, in kernel versions before v5.14 you could disable compression by passing an empty string instead of explicitly mentioning none or no. Since kernel version 5.14, an empty string resets to default behavior.

Can I set compression per-subvolume?

Currently no, this is planned. You can simulate this by enabling compression on the subvolume directory and the files/directories will inherit the compression flag.

What's the precedence of all the options affecting compression?

Compression to newly written data happens:

  1. always -- if the filesystem is mounted with -o compress-force
  2. never -- if the NOCOMPRESS flag is set per-file/-directory
  3. if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but it may get converted to NOCOMPRESS eventually
  4. if possible -- if the -o compress mount option is specified

Note, that mounting with -o compress will not set the +c file attribute.

How can I recursively compress/uncompress a directory (including guessed/forced-compression)

Uset the btrfs filesystem defrag command, the option -r will process the files recursively in a directory. This is independent of the mount options compress or compress-force, and using the option -c you can set the compression algorithm.

Currently (v4.14), it's not possible to select "no compression", using the defrag command. This may change in the future.


How does compression interact with direct IO or COW?

Compression does not work with direct IO (DIO), does work with COW (the default) and does not work for NOCOW files. If a file is opened in DIO mode, it will fall back to buffered IO.

Are there speed penalties when doing random access to a compressed file?

Yes. The compression processes ranges of a file of maximum size 128 KiB and compresses each 4 KiB (or page-sized) block separately. Accessing a byte in the middle of the given 128 KiB range requires to decompress the whole range. This is not optimal and is subject to optimizations and further development.

What happens to incompressible files?

There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case compression will always be attempted on the file only to be later discarded. This is not optimal and subject to optimizations and further development.

This means that many times, even if you have compression enabled, if the first portion of the file doesn't compress well, but the rest *does* it still won't compress the rest. Recommend using -o compress-force if you really want compression enabled on a mounted filesystem, though if you have many differing types files, then just -o compress might work well for you.

Personal tools