From btrfs Wiki
Jump to: navigation, search



Out of band / batch deduplication is deduplication done outside of the write path. We've sometimes called it offline deduplication, but that can confuse people: btrfs dedup involves the kernel and always happens on mounted filesystems. To use out-of-band deduplication, you run a tool which searches your filesystem for identical blocks, and then deduplicates them.


From the README for duperemove:

Duperemove is a simple tool for finding duplicated extents and submitting them for deduplication. When given a list of files it will hash their contents on a block by block basis and compare those hashes to each other, finding and categorizing extents that match each other. When given the -d option, duperemove will submit those extents for deduplication using the btrfs-extent-same ioctl.

This tool finds and lists duplicate extents, and optionally will submit the duplicates to the kernel for deduplication.


bedup implements incremental whole-file batch deduplication for Btrfs.

bedup supports mainline kernels (Linux 3.3 is required by the file locking implementation, 3.6 is required for cross-subvolume operation), using the clone ioctl (which exposes Btrfs's copy-on-write functionality).

An ioctl dedicated to batch deduplication was merged in Linux 3.12. It brings in-kernel locking (btrfs guarantees that the deduplicated data is identical without the need for outside locks), better support for read-only snapshots, and retains the features and much of the implementation of the clone ioctl. There is an experimental bedup branch using the new ioctl, but it is not the default because it can trigger kernel crashes as of Linux 4.2.


Inband / synchronous / inline deduplication is deduplication done in the write path, so it happens as data is written to the filesystem. This typically requires large amounts of RAM to store the lookup table of known block hashes. Patches are currently being worked on.

Personal tools