Development notes

From btrfs Wiki
(Difference between revisions)
Jump to: navigation, search
(dev scheudle)
(Coding style preferences: updates)
 
(60 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''page under construction'''
 
 
 
Collection of various notes about development practices, how-to's or checklists.
 
Collection of various notes about development practices, how-to's or checklists.
 +
 +
= Misc notes =
  
 
== Adding a new ioctl, extending an existing one ==
 
== Adding a new ioctl, extending an existing one ==
  
* add code to ''strace'' so the ioctl calls are parsed into a human readable form, (reference: Jeff's patchset)
+
* add code to [https://github.com/strace/strace strace] so the ioctl calls are parsed into a human readable form. Most of the ioctls are already [https://github.com/strace/strace/blob/master/btrfs.c implemented] and can be used a reference.
 +
 
 +
== Tracepoints ==
 +
 
 +
The tracepoint message format should be compact and consistent, so please
 +
stick to the following format:
 +
 
 +
* ''key=value'' no spaces around ''=''
 +
* separated by spaces, not commas
 +
* named values: print value and string, like "%llu(%s)", no space between, string in parens
 +
* avoid abbreviating key values too much, (eg. use 'size' not 'sz')
 +
* hexadecimal values are always preceded by ''0x'' (use "0x%llx)
 +
* use ''struct btrfs_inode'' for inode types, not plain ''struct inode''
 +
* inode number type is ''u64'', use ''btrfs_ino'' if needed
 +
* key can be printed as ''[%llu,%u,%llu]''
 +
* enum types need to be exported as ''TRACE_DEFINE_ENUM''
 +
 
 +
'''Example:'''
 +
 
 +
event: ''btrfs__chunk''
 +
 
 +
string:
 +
 
 +
  "root=%llu(%s) offset=%llu size=%llu num_stripes=%d sub_stripes=%d type=%s"
 +
 
 +
== Error messages, verbosity ==
 +
 
 +
* use ''btrfs_*'' helpers (btrfs_err, btrfs_info, ...), they print a filesystem identification like
 +
BTRFS info (device sdb): ...
 +
* first letter in the string is lowercase
 +
* message contents
 +
** be descriptive
 +
** keep the text length reasonable (fits one line without wrapping)
 +
** no typos
 +
** print values that refer to what happend (inode number, subvolume id, path, offset, ...)
 +
** print error value from the last call
 +
** no ''"\n"'' at the end of the string
 +
** no ''".'''' at the end of text
 +
** un-indent the string so it fits under 80 columns
 +
** don't split long strings, overflow 80 is ok in this case (we want to be able to search for the error messages in the sources easily)
 +
* value representation
 +
** decimal: offsets, length, ordinary numbers
 +
** hexadecimal: checksums
 +
** hexadecimal + string: bitmasks (eg. raid profiles, flags)
 +
** intervals of integers:
 +
*** closed interval (end values inclusive): [0, 4096]
 +
*** half-open (right value excluded): [0, 4096)
 +
*** half-open (left value excluded): (0, 4096] -- that one may look strange but is used in several places
 +
 
 +
=== Message level ===
 +
 
 +
* btrfs_err -- such messages have high visibility so use them for serious problems that need user attention
 +
* btrfs_warn -- conditions that are not too serious but can point to potential problems, the system should be still in a good state
 +
* btrfs_info -- use for informative messages that are useful to see what's happening in the filesystem or might help debugging problems in the future and are worth keeping in the logs
 +
 
 +
== Error handling and transaction abort ==
 +
 
 +
Please keep all transaction abort exactly at the place where they happen and
 +
do not merge them to one. This pattern should be used everwhere and is
 +
important when debugging because we can pinpoint the line in the code from the
 +
syslog message and do not have to guess which way it got to the merged call.
  
== Development schedule ==
+
== Handling unexpected conditions ==
  
A short overview of the development phases of linux kernel and what this means for developers regarding sending patches etc.
+
This is different than error handling. An unexpected condtion happens when the code invariants/assumptions do not hold and there's no way to recover from the situation. This means that returning an error to the caller can't be done and continuing would only propagate the logic error further. The reasons for that bug can be two fold: internal (a genuine bug) or external (eg. memory biflip, memory corrupted by other subystem). In this case it is allowed to use the nuclear option and do BUG_ON, that is otherwise highly discouraged.
  
=== Major release ===
+
There are several ways how to react to the unexpected conditions:
  
''Overall:'' a major release is done by Linus, the version bumps in the 2nd position of the version, eg. it's ''4.6''. This usually means distributions start to adopt the sources, the stable kernels are going to be released.
+
* ASSERT -- conditionally compiled in and crashes when the condition is false, this is supposed to catch 'must never happen' at the time of development, code must not continue
 +
* WARN_ON -- light check that is visible in the log and allows the code to continue but the reasons must be investigated
 +
* BUG_ON -- last resort, checks condition that 'must never happen' and continuing would cause more harm than the instant crash; code should always try to avoid using it, but there are cases when sanity and invariant checks are done in advance
  
''Developers:'' expect bugreports based on this version, this usually does not have other significance regarding development of new features or bugfixes
+
== Error injection using eBPF ==
  
=== Merge window ===
+
Functions can be annotated to enable error injection using the eBPF scripts. See eg. <tt>disk-io.c:open_ctree</tt>. For btrfs-specific injection, the annotation is ALLOW_ERROR_INJECTION, but beware that this only overrides the return value and this can leak memory or other resources. For error injection to generic helpers (eg. memory allocation), you cau se something like <tt>bcc/tools/inject.py kmalloc btrfs_alloc_device() -P 0.001</tt>
  
''Overall:'' the time when pull requests from 1st level maintainers get sent to Linus, the merge window starts after the major release and usually takes two weeks
+
Resources:
 +
* eBPF
 +
* BCC tools
  
''Developers:'' get ready with any bugfixes that were not part of the patches in the pull requests but are still relevant for the upcoming kernel
+
= Coding style preferences =
  
There are usually one or two pull requests sent by the maintainer so it's OK to send the bugfixes to the mailinglist even during the merge window period. If the "deadline" is not met, the patches get merged in the next ''rc''.
+
Before applying recommendations from this sections, please make sure you're familiar with the [https://www.kernel.org/doc/html/latest/process/coding-style.html|generic kernel coding style guide].
  
Sending big patchsets during this period is not discouraged, but feedback may be delayed.
+
* comment new data structures
 +
* comment new enum/define values, brief description or pointers to the code that uses them
 +
* function comment goes to the .c file, not the header
 +
* exported functions have btrfs_ prefix
 +
* fix coding style in code that's only moved
 +
* fix spelling, grammar and formatting of comments that get moved or changed
 +
* leave one newline before #endif in headers
 +
* for patch subject use "btrfs:" followed by a lowercase
 +
* do not use <tt>spin_is_locked</tt> but <tt>lockdep_assert_held</tt>
 +
* do not use functions with double underscore, there's only a few valid uses of that, namely when ''__function'' is doing the core of the work with looser checks, no locks or more parameters than ''function''
 +
* default function return value is ''int ret'', temporary return values should be named like ''ret2'' etc
 +
* avoid prototypes for static functions, order them in new code in a way that does not need it
 +
* when dumping a lot of data after an error, consider what will remain visible last
 +
** in case of <tt>btrfs_print_leaf</tt>, print the specific error message after that
  
The amount of changes that go to ''master'' branch from the rest of the kernel is high, things can break due to reasons unrelated to btrfs changes. Testing is welcome, but the bugs could turn out not to be relevant to us.
+
= Kernel config options =
  
=== The rc1 ===
+
== Testing ==
  
''Overall:'' most of the kernel changes are now merged, no new features are allowed to be added, the following period until the major release is expected to fix only regressions
+
Compile-time config options for kernel that can help debugging, testing.
 +
They usually take a hit on performance or resources (memory) so they should be
 +
selected wisely. The options in '''bold''' should be safe to use by default
 +
for debugging builds.
  
''Developers:'' it's a good time to test extensively, changes in VFS, MM, scheduler, debugging features and other subsystems could introduce bugs or misbehaviour
+
Please refer to the option documentation for further details.
  
From now on until the late release candidates, it's a good time to post big patchsets that are supposed to land in the next kernel. There's time to let others to do review, discuss design goals, do patchset revisions based on feedback.
+
* devices for testing
 +
** '''CONFIG_BLK_DEV_LOOP''' - enable loop device
 +
** for fstests: '''DM_FLAKEY''', '''CONFIG_FAIL_MAKE_REQUEST'''
 +
** '''CONFIG_SCSI_DEBUG''' - fake scsi block device
 +
* memory
 +
** '''CONFIG_SLUB_DEBUG''' - boot with slub_debug
 +
** CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (on newer kernels)
 +
** CONFIG_SCHED_STACK_END_CHECK
 +
** CONFIG_PAGE_POISONING
 +
** CONFIG_HAVE_DEBUG_KMEMLEAK
 +
** CONFIG_FAILSLAB -- fault injection to kmalloc
 +
** CONFIG_DEBUG_LIST
 +
** CONFIG_BUG_ON_DATA_CORRUPTION
 +
* btrfs
 +
** '''CONFIG_BTRFS_DEBUG''', '''CONFIG_BTRFS_ASSERT''', '''CONFIG_BTRFS_FS_RUN_SANITY_TESTS'''
 +
** '''CONFIG_BTRFS_FS_CHECK_INTEGRITY'''
 +
* locking
 +
** CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES
 +
** CONFIG_DEBUG_LOCK_ALLOC
 +
** CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP
 +
** CONFIG_LOCK_STAT
 +
** CONFIG_PROVE_RCU
 +
** CONFIG_DEBUG_ATOMIC_SLEEP
 +
* sanity checks
 +
** CONFIG_DEBUG_STACK_USAGE, CONFIG_HAVE_DEBUG_STACKOVERFLOW, CONFIG_DEBUG_STACKOVERFLOW
 +
** CONFIG_STACKTRACE
 +
** CONFIG_KASAN -- address sanitizer
 +
** CONFIG_UBSAN -- undefined behaviour sanitizer
 +
** CONFIG_KCSAN -- concurrency checker
 +
* verbose reporting
 +
** CONFIG_DEBUG_BUGVERBOSE
 +
* tracing
 +
** CONFIG_TRACING etc
  
Depending on the proposed changes, the patchset could be queued for the next release within that time. If the patchset is intrusive, it could stay in the ''for-next'' branches for some time.
+
= (x)fstests =
  
=== The late rcX (rc5 and up) ===
+
The fstests suite has very few "hard" requirements and will succeed without actually running many tests.  In order to ensure full test coverage, your test environment should provide the settings from the following sections. Please note that newly added tests silently add new dependencies, so you should always review results after an update.
 +
 +
== Kernel config options for complete test coverage ==
 +
* <code>CONFIG_FAULT_INJECTION=y</code>
 +
* <code>CONFIG_FAULT_INJECTION_DEBUG_FS=y</code>
 +
* <code>CONFIG_FAIL_MAKE_REQUEST=y</code>
 +
* <code>CONFIG_DM_FLAKEY=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_THIN_PROVISIONING=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_SNAPSHOT=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_DELAY=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_ERROR=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_LOG_WRITES=m</code> or <code>y</code>
 +
* <code>CONFIG_DM_DUST=m</code> or <code>y</code>
 +
* <code>CONFIG_BLK_DEV_LOOP=m</code> or <code>y</code>
 +
* <code>CONFIG_EXT4_FS=m</code> or <code>y</code>
 +
* <code>CONFIG_SCSI_DEBUG=m</code>
  
''Overall:'' based on past experience, there are at least 5 release candidates, done on a weekly basis, so you can estimate the amount of time before the full release or merge window. The 5 seems like am minimum, usually there are 2 or 3 more release candidates.
+
== Kernel config options for better bug reports ==
  
''Developers:'' new code for the upcoming kernel is supposed to be reviewed and tested, can be found in the ''for-next'' branch
+
See the list in the section above for more options.
  
Sending intrusive changes at this point is not guaranteed to be reviewed or testd in time so it gets queued for the next kernel. This highly depends on the nature of the changes. Patch count should not be an issue if the patches are revieweable or do not do intrusive changes.
+
== User space utilities and development library dependencies ==
 +
* fio
 +
* dmsetup (device-mapper)
 +
* lvm
 +
* xfsprogs >= 4.3.1
 +
** <code>xfs_io -c reflink</code> is required.
 +
* btrfsprogs
 +
* dbench
 +
* openssl
 +
* libacl
 +
* libattr
 +
* libaio
 +
* libuuid
 +
* libcap-progs
 +
* duperemove
 +
----
 +
Note: This list may be incomplete.
  
=== Major release ===
+
== Storage environment ==
 +
* At least 4 identically sized partitions/disks/virtual disks, specified using <code>$SCRATCH_DEV_POOL</code>
 +
** some tests may require 6 such partitions
 +
* some tests need at least 10G of free space, as determined by <code>df</code>, ie. the size of the device may need to be larger
 +
* some tests require <code>$LOGWRITES_DEV</code>, yet another separate block device, for power fail testing
 +
* for testing trim and discard, the devices must be capable of that (physical or virtual)
  
<tt>goto 1;</tt>
+
== Other requirements ==
 +
* An <code>fsgqa</code> user and group must exist.
 +
* An <code>123456-fsgqa</code> user and group must exist.

Latest revision as of 16:16, 10 September 2020

Collection of various notes about development practices, how-to's or checklists.

Contents

[edit] Misc notes

[edit] Adding a new ioctl, extending an existing one

  • add code to strace so the ioctl calls are parsed into a human readable form. Most of the ioctls are already implemented and can be used a reference.

[edit] Tracepoints

The tracepoint message format should be compact and consistent, so please stick to the following format:

  • key=value no spaces around =
  • separated by spaces, not commas
  • named values: print value and string, like "%llu(%s)", no space between, string in parens
  • avoid abbreviating key values too much, (eg. use 'size' not 'sz')
  • hexadecimal values are always preceded by 0x (use "0x%llx)
  • use struct btrfs_inode for inode types, not plain struct inode
  • inode number type is u64, use btrfs_ino if needed
  • key can be printed as [%llu,%u,%llu]
  • enum types need to be exported as TRACE_DEFINE_ENUM

Example:

event: btrfs__chunk

string:

  "root=%llu(%s) offset=%llu size=%llu num_stripes=%d sub_stripes=%d type=%s"

[edit] Error messages, verbosity

  • use btrfs_* helpers (btrfs_err, btrfs_info, ...), they print a filesystem identification like
BTRFS info (device sdb): ...
  • first letter in the string is lowercase
  • message contents
    • be descriptive
    • keep the text length reasonable (fits one line without wrapping)
    • no typos
    • print values that refer to what happend (inode number, subvolume id, path, offset, ...)
    • print error value from the last call
    • no "\n" at the end of the string
    • no ".'' at the end of text
    • un-indent the string so it fits under 80 columns
    • don't split long strings, overflow 80 is ok in this case (we want to be able to search for the error messages in the sources easily)
  • value representation
    • decimal: offsets, length, ordinary numbers
    • hexadecimal: checksums
    • hexadecimal + string: bitmasks (eg. raid profiles, flags)
    • intervals of integers:
      • closed interval (end values inclusive): [0, 4096]
      • half-open (right value excluded): [0, 4096)
      • half-open (left value excluded): (0, 4096] -- that one may look strange but is used in several places

[edit] Message level

  • btrfs_err -- such messages have high visibility so use them for serious problems that need user attention
  • btrfs_warn -- conditions that are not too serious but can point to potential problems, the system should be still in a good state
  • btrfs_info -- use for informative messages that are useful to see what's happening in the filesystem or might help debugging problems in the future and are worth keeping in the logs

[edit] Error handling and transaction abort

Please keep all transaction abort exactly at the place where they happen and do not merge them to one. This pattern should be used everwhere and is important when debugging because we can pinpoint the line in the code from the syslog message and do not have to guess which way it got to the merged call.

[edit] Handling unexpected conditions

This is different than error handling. An unexpected condtion happens when the code invariants/assumptions do not hold and there's no way to recover from the situation. This means that returning an error to the caller can't be done and continuing would only propagate the logic error further. The reasons for that bug can be two fold: internal (a genuine bug) or external (eg. memory biflip, memory corrupted by other subystem). In this case it is allowed to use the nuclear option and do BUG_ON, that is otherwise highly discouraged.

There are several ways how to react to the unexpected conditions:

  • ASSERT -- conditionally compiled in and crashes when the condition is false, this is supposed to catch 'must never happen' at the time of development, code must not continue
  • WARN_ON -- light check that is visible in the log and allows the code to continue but the reasons must be investigated
  • BUG_ON -- last resort, checks condition that 'must never happen' and continuing would cause more harm than the instant crash; code should always try to avoid using it, but there are cases when sanity and invariant checks are done in advance

[edit] Error injection using eBPF

Functions can be annotated to enable error injection using the eBPF scripts. See eg. disk-io.c:open_ctree. For btrfs-specific injection, the annotation is ALLOW_ERROR_INJECTION, but beware that this only overrides the return value and this can leak memory or other resources. For error injection to generic helpers (eg. memory allocation), you cau se something like bcc/tools/inject.py kmalloc btrfs_alloc_device() -P 0.001

Resources:

  • eBPF
  • BCC tools

[edit] Coding style preferences

Before applying recommendations from this sections, please make sure you're familiar with the kernel coding style guide.

  • comment new data structures
  • comment new enum/define values, brief description or pointers to the code that uses them
  • function comment goes to the .c file, not the header
  • exported functions have btrfs_ prefix
  • fix coding style in code that's only moved
  • fix spelling, grammar and formatting of comments that get moved or changed
  • leave one newline before #endif in headers
  • for patch subject use "btrfs:" followed by a lowercase
  • do not use spin_is_locked but lockdep_assert_held
  • do not use functions with double underscore, there's only a few valid uses of that, namely when __function is doing the core of the work with looser checks, no locks or more parameters than function
  • default function return value is int ret, temporary return values should be named like ret2 etc
  • avoid prototypes for static functions, order them in new code in a way that does not need it
  • when dumping a lot of data after an error, consider what will remain visible last
    • in case of btrfs_print_leaf, print the specific error message after that

[edit] Kernel config options

[edit] Testing

Compile-time config options for kernel that can help debugging, testing. They usually take a hit on performance or resources (memory) so they should be selected wisely. The options in bold should be safe to use by default for debugging builds.

Please refer to the option documentation for further details.

  • devices for testing
    • CONFIG_BLK_DEV_LOOP - enable loop device
    • for fstests: DM_FLAKEY, CONFIG_FAIL_MAKE_REQUEST
    • CONFIG_SCSI_DEBUG - fake scsi block device
  • memory
    • CONFIG_SLUB_DEBUG - boot with slub_debug
    • CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (on newer kernels)
    • CONFIG_SCHED_STACK_END_CHECK
    • CONFIG_PAGE_POISONING
    • CONFIG_HAVE_DEBUG_KMEMLEAK
    • CONFIG_FAILSLAB -- fault injection to kmalloc
    • CONFIG_DEBUG_LIST
    • CONFIG_BUG_ON_DATA_CORRUPTION
  • btrfs
    • CONFIG_BTRFS_DEBUG, CONFIG_BTRFS_ASSERT, CONFIG_BTRFS_FS_RUN_SANITY_TESTS
    • CONFIG_BTRFS_FS_CHECK_INTEGRITY
  • locking
    • CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES
    • CONFIG_DEBUG_LOCK_ALLOC
    • CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP
    • CONFIG_LOCK_STAT
    • CONFIG_PROVE_RCU
    • CONFIG_DEBUG_ATOMIC_SLEEP
  • sanity checks
    • CONFIG_DEBUG_STACK_USAGE, CONFIG_HAVE_DEBUG_STACKOVERFLOW, CONFIG_DEBUG_STACKOVERFLOW
    • CONFIG_STACKTRACE
    • CONFIG_KASAN -- address sanitizer
    • CONFIG_UBSAN -- undefined behaviour sanitizer
    • CONFIG_KCSAN -- concurrency checker
  • verbose reporting
    • CONFIG_DEBUG_BUGVERBOSE
  • tracing
    • CONFIG_TRACING etc

[edit] (x)fstests

The fstests suite has very few "hard" requirements and will succeed without actually running many tests. In order to ensure full test coverage, your test environment should provide the settings from the following sections. Please note that newly added tests silently add new dependencies, so you should always review results after an update.

[edit] Kernel config options for complete test coverage

  • CONFIG_FAULT_INJECTION=y
  • CONFIG_FAULT_INJECTION_DEBUG_FS=y
  • CONFIG_FAIL_MAKE_REQUEST=y
  • CONFIG_DM_FLAKEY=m or y
  • CONFIG_DM_THIN_PROVISIONING=m or y
  • CONFIG_DM_SNAPSHOT=m or y
  • CONFIG_DM_DELAY=m or y
  • CONFIG_DM_ERROR=m or y
  • CONFIG_DM_LOG_WRITES=m or y
  • CONFIG_DM_DUST=m or y
  • CONFIG_BLK_DEV_LOOP=m or y
  • CONFIG_EXT4_FS=m or y
  • CONFIG_SCSI_DEBUG=m

[edit] Kernel config options for better bug reports

See the list in the section above for more options.

[edit] User space utilities and development library dependencies

  • fio
  • dmsetup (device-mapper)
  • lvm
  • xfsprogs >= 4.3.1
    • xfs_io -c reflink is required.
  • btrfsprogs
  • dbench
  • openssl
  • libacl
  • libattr
  • libaio
  • libuuid
  • libcap-progs
  • duperemove

Note: This list may be incomplete.

[edit] Storage environment

  • At least 4 identically sized partitions/disks/virtual disks, specified using $SCRATCH_DEV_POOL
    • some tests may require 6 such partitions
  • some tests need at least 10G of free space, as determined by df, ie. the size of the device may need to be larger
  • some tests require $LOGWRITES_DEV, yet another separate block device, for power fail testing
  • for testing trim and discard, the devices must be capable of that (physical or virtual)

[edit] Other requirements

  • An fsgqa user and group must exist.
  • An 123456-fsgqa user and group must exist.
Personal tools