Developer's FAQ

From btrfs Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated.

Contents

Contributor's FAQ

The term contributor is broader and does not only mean contribution of code. The documentation is a significant part and this is where non-technical people add value. The user's POV brings different questions than the developers' and explaining things in human language is a good thing.

Sending patches to documentation follows the same practices as for the code.

The documentation referred here is of the userspace tools (btrfs-progs), the manual pages or the documentation that's part of the tool help strings.

The fixes range from rewording unclear sections, fixing formatting, spelling, or adding more examples.

Documentation patches have high chance of getting merged and released quickly.

Developer's FAQ

By the term developer is meant somebody who's working on the code.

This section assumes basics of working with git, sending patches via mail and aims to cover the current practices.

Patch tags

The practice of tagging patches in linux kernel community is documented in https://www.kernel.org/doc/html/latest/process/submitting-patches.html, we'll highlight the most frequently used tags and their expected meaning. This only briefly mentions the commonly used tags. You're encouraged to read the whole document and get familiar with it.

Signed-off-by:

This tag may appear multiple times, the first one denotes the patch author. (The common abbreviation in free text is S-O-B or just sob line.) The patch author is also recorded in git log history.

Then, each maintainer that processed the patch adds his sob line.

Reference: Section 11 of SubmittingPatches

Do: Always send a patch with at least one such line with your name and email. If more people contributed to the patch, add their names and addresses too.

Don't: Add a sob line under patch that you have no authoring relation to, eg. as a reply to the mailinglist after you've reviewed a patch. See below.

Reviewed-by:

The patch has been reviewed and the singed person is putting his hand into fire. If there's a bug found in this patch, the person is usually a good candidate for a CC: of the bugreport.

Reference: Section 13 of SubmittingPatches

Do: talk to the maintainer if he forgot to add this tag to the final patch. Reviews do take time and the patches land in various branches early after they're sent to the mailingslist for testing, but the reviews are always welcome.

Do: collect the Reviewed-by tags for patches that get resent unchanged eg. within a larger patch series

Acked-by:

A more lightweight form of Reviewed-by, acknowledging that the patch is going the right direction, but that the person has not done a deeper examination of the patch. Asking for an ACK can be expressed by a CC: tag in the patch.

Tested-by:

Indicates that the patch has been successfully tested in some environment, usually follows a proposed fix and closes the feedback loop.

Reference: Section 13 of SubmittingPatches

Do: or rather you're encouraged to add this tag to a patch that you've tested.

CC:

Add this tag to the patch if you feel that the person should be aware of the patch.

Ordering

The order of the tags can track the flow of the patches through various trees, namely the Signed-off-by tag. Ordering of the other tags is not strict so you can find patches with randomly mixed tags. A common practice we find kind of useful is to sort them how things happened. It would be good to use that, namely the references to stable trees and original reports.

  • how it happened
    1. (optional) Bugzilla:
    2. (optional) Link:
    3. Reported-by:
  • where it should be backported, relevant references
    1. Fixes:
    2. CC: stable@vger.kernel.org # 4.4+
  • other tags
    1. CC:
    2. Suggested-by:
  • quality control by non-authors
    1. Reviewed-by:
    2. Tested-by:
  • author(s)
    1. Signed-off-by:
  • maintainer(s)
    1. Reviewed-by:
    2. Signed-off-by:

Patch flow

Simple patch

  1. developer works on the patch, self-reviews, tests, adds the formal tags, writes changelog
  2. patch lands in the mailinglist
  3. patch is commented, reviewed
    • several iterations of updates may follow
  4. maintainer adds the patch into a branch
  5. when the right time comes, a branch with selected patches is pushed up the merge chain
  6. a release milestone that contains the patch is released, everybody is happy

Controversial changes

This happens, not every patch gets merged. In the worst case there are not even any comments under the patch and it's silently ignored. This depends on many factors, most notably *cough*time*cough*. Examining potential drawbacks or forseeing disasters is not an easy job.

Let's be more positive, you manage to attract the attention of some developer and he says, he does not like the approach of the patch(es). Better than nothing, isn't it? Depending on the feedback, try to understand the objections and try to find a solution or insist on your approach but possibly back it by good arguments (performance gain, expected usecase) or a better explanation why the change is needed.

Repeated submissions

If you got feedback for a patch that pointed out changes that should be done before the patch can be merged, please do apply the changes or give a reason why they're wrong or not needed. (You can try to pinkie-swear to implement them later, but do not try this too often.)

For the next iteration, add a short description of the changes made, under the first --- (tripple dash) marker in the patch. For example (see also Example 3):

---
V3: renamed variable
V2: fixed typo

Keep all previous changelogs. Larger patchsets should contain the incremental changelogs in the cover letter.

Patch completeness, RFC

A patch does not necessarily have to implement the whole feature or idea. You can send an early version, use a [RFC] string somewhere in the subject. This means request for comments. Be prepared to get comments.

Please describe the level of completeness, eg. what tests it does or does not pass or what type of usecases is not yet implemented. The purpose is to get feedback from other developers about the direction or implementation approach. This may save you hours of coding.

Patchsets

Related patches, patch dependency

Group the patches by feature or by topic. Implementing a particular feature may need to prepare other parts of the code for the main patch. Applying the patches out of order will not succeed, so it's pointless to send them as unrelated and separate mails. The git tool is helpful here, see git-format-patch.

An example of grouping by topic is cleanups, or small bugfixes that are quite independent but it would be better to processes them in one go.

Sometimes a patch from a series is self-contained enough that it might get applied ahead of the whole series. You may also submit it separately as this will decrease the work needed to keep the patch series up to date with the moving development base.

Do: make sure that each patch compiles and does not deliberately introduce a bug, this is a good practice that makes bisecting go smooth

Do: send the cover letter (ie. the non-patch mail) with brief description of the series, this is a place where feedback to the whole patchset will be sent rather than comments to the individual patches. To generate the series with cover letter use git format-patch --cover-letter --thread.

Good practices, contribution hints

  • if you feel that you understand some area of code enough to stick your Reviewed-by to submitted patches, please do. Even for small patches.
  • don't hesitate to be vocal if you see that a wrong patch has been committed
  • be patient if your patch is not accepted immediatelly, try to send a gentle ping if there's a significant time without any action
  • if you want to start contributing but are not sure about how to do that, lurk in the mailingist or on the IRC channel
  • every patch should implement one thing -- this is vaguely defined, you may receive comments about patch splitting or merging with other
  • every patch must be compilable when applied, possibly with all related CONFIG_ variable values
  • send a new patch as a new mail, not within another thread, it might get missed
  • use git-format-patch and git-send-email

Sample patches

There are some formalities expected, like subject line formatting, or the tags. Although you may find them annoying at first, they help to classify the patches among the rest of mails.

Subject:

For kernel patches add the prefix btrfs:
for userspace tools add btrfs-progs:

Example 1

From: John Doe <john@doe.org>
Subject: [PATCH] btrfs: merge common code into a helper

The code for creating a new tree is open-coded in a few places, add a helper
and remove the duplicate code.

Signed-off-by: John Doe <john@doe.org>
---
fs/btrfs/volumes.c |    5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e138af710de2..3f0cc12ec488 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
(rest of the patch)

Example 2

From: Jane Doe <jane@doe.org>
Subject: [PATCH] btrfs-progs: enhance documentation of balance

Add examples of typical balance use, common problems and how to resolve them.

Signed-off-by: Jane Doe <jane@doe.org>
---
Documentation/btrfs-balance.txt |    20 +++++++++++++++++++-
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfs-balance.txt b/Documentation/btrfs-balance.txt
index e138af710de2..3f0cc12ec488 100644
--- a/Documentation/btrfs-balance.txt
+++ b/Documentation/btrfs-balance.txt
(rest of the patch)

Example 3

From: John Doe <john@doe.org>
Subject: [PATCH v3] btrfs: merge common code into a helper

The code for creating a new tree is open-coded in a few places, add a helper
and remove the duplicate code.

Signed-off-by: John Doe <john@doe.org>
---
V3: add helper prototype into header file
V2: found one more open-coded instance

fs/btrfs/ctree.h   |    1 +
fs/btrfs/volumes.c |    5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e138af710de2..3f0cc12ec488 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
(rest of the patch)

Pull requests vs patches to mailinglist

By default, all patches should be sent as mails to the mailinglist. The discussions or reviews happen there. Putting a patch series to a git branch may be convenient, but does not mean the exact unchanged branch will be pulled.

There are some criteria that have to be met before this happens. The patches should meet/have:

  • no coding style violations
  • good quality of implementation, should not exhibit trivial mistakes, lack of comments
  • unspecified number of other things that usually get poitned out in review comments
    • this knowledge can be demonstrated by doing reviews of other developers' patches
    • doing reviews of other developers' patches is strongly recommended
  • good changelogs and subject lines
  • the base point of the git branch is well-defined (ie. a stable release point or last development point, that will not get rebased)

The third point is vague, mostly refers to preferred coding patterns that we discover and use over time. This may also explain why the pull-based workflow is not used often. Both parties, developers and maintainers, need to know that the code to be pulled will be satisfactory in this regard.

It should be considered normal to send more than one round of a patchset, based on review comments that hopefully do not need to point out issues in anything of the above. Rather focus on design or potential uses and other impact.

Kernel patches

If you think you're able to provide the expected quality of patches and are familiar with a bit more advanced git use, ask the maintainers. If you're a long-term developer, the maintainers can also ask you to start the pull-based workflow. The main point for the workflow is to make maintainers' life easier.

Suggested branch names for patchsets for current development cycle:

  • base -- the last commit of the last pull (could be in branch named something like for-linus or integration), or some older head of pull request
  • for-current -- patches that have been reviewed and qualify for the upcoming rc

Patches for next development cycle:

  • base -- the last release candidate tag in Linus' tree, be sure not to be ahead of the integration branches that will become the pull requests for the next dev cycle.
  • for-next -- patches should be in a good state, ie. don't complicate testing too much, workarounds or known problems should be documented (eg. in the patchset cover letter)
  • other names, for example a patchset for a given feature as a topic branch: feature-live-repair

Experimental, unsafe or unreviewed patchsets are good candidates for topic branches as they could be added or removed from the for-next branches easily (compared to manually removing the patches from a long series).

Btrfs-progs patches

The first paragraph from previous section applies here as well.

Unlike the kernel, there are no release candidates during development. If a patchset is independent, the master branch is a suitable point. In case there are other patches in devel, a non-rebased development branch needs to be created. As this is not needed most of the time, this will happen only on-demand.

Copyright notices in files, SPDX

License information in files is using the SPDX standard.

Quoting https://spdx.dev/about/:

SPDX is an open standard for communicating software bill of material information, including provenance, license, security, and other related information. SPDX reduces redundant work by providing common formats for organizations and communities to share important data, thereby streamlining and improving compliance, security, and dependability. The SPDX specification is recognized as the international open standard for security, license compliance, and other software supply chain artifacts as ISO/IEC 5962:2021.

The initiative started in 2017 [1] aims to unify licensing information in all files using SPDX tags, this is driven by the Linux Foundation. Therefore it's not necessary to repeat the license header (GPL) in each file, the licensing rules are documented in https://docs.kernel.org/process/license-rules.html (also available in linux git tree in Documentation/process/license-rules.rst which is the source file of the linked document).

Complete information with history about contributors of changes in a particular file is recorded in git using Signed-off-by tags that are documented and widely understood (Developer's certificate of origin). For more information please see https://docs.kernel.org/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin .

Copyright notices in files. This delves into the legal territory and there's no easy answer about the recommended practice. This paragraph could help you to answer some questions regarding that but is not by any means complete and refers to known and documented practices:

  • All code merged into the mainline kernel retains its original ownership (https://docs.kernel.org/process/1.Intro.html#licensing)
  • Removing the copyright from existing files is not trivial and would require asking the original authors or current copyright holders.
  • In btrfs code, adding copyright notices is not mandatory
  • In btrfs code, there are some copyright notices present from years before 2017
  • In btrfs code, the copyright notices are inconsistent and incomplete (please refer to git history to look up the cod change authors)
  • If not sure, please consult your lawyers

Btrfs developer community perspective, not legally binding.

The copyright notices are not required and are discouraged for reasons that are practical rather than legal. The files do not track all individual contributors nor companies (this can be found in git), so the inaccurate and incomplete information gives a very skewed if not completely wrong idea about the copyright holders of changes in a given file. The code is usually heavily changed over time in smaller portions, slowly morphing into something that does not resemble the original code anymore though it shares a lot of the core ideas and implemented logic. A copyright notice by a company that does not exist anymore from 10 years ago is a clear example of uselessnes for the developers.

When code is moved verbatim from a file to another file, in the new file it appears to be contributed by a single author while it is in most cases code resulting from many previous contributions from other people. This is most obvious when splitting code from big files to new ones, because this is considered a good development practice, but somehow goes against the meaning of the copyright notices, unless a complete list of original code authors and copyright holders is also copied.

The current copyright notices will not be removed but at least new contributions won't continue adding new ones. The change history is recorded in git using Signed-off-by tags that are documented (https://docs.kernel.org/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin) and widely understood. Unless there's a blessed practice regarding the copyright notices that significantly improves the situation, the current way is considered sufficient for practical development purposes.

Given all the above, we don't want the copyright notices in individual files that are new, renamed or split. This applies to all new changes in all btrfs related code changes. Obviously lawyers may disagree and/or may require additional refinements to the process, which is fine but beyond the scope of this section.

You may also read a perspective from Linux Foundation that shares a similar view: https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/

Development schedule

A short overview of the development phases of linux kernel and what this means for developers regarding sending patches etc.

Major release

Overall: a major release is done by Linus, the version bumps in the 2nd position of the version, eg. it's 4.6. This usually means distributions start to adopt the sources, the stable kernels are going to be released.

Developers: expect bugreports based on this version, this usually does not have other significance regarding development of new features or bugfixes

Merge window

Overall: the time when pull requests from 1st level maintainers get sent to Linus, the merge window starts after the major release and usually takes two weeks

Developers: get ready with any bugfixes that were not part of the patches in the pull requests but are still relevant for the upcoming kernel

There are usually one or two pull requests sent by the maintainer so it's OK to send the bugfixes to the mailinglist even during the merge window period. If the "deadline" is not met, the patches get merged in the next rc.

Sending big patchsets during this period is not discouraged, but feedback may be delayed.

The amount of changes that go to master branch from the rest of the kernel is high, things can break due to reasons unrelated to btrfs changes. Testing is welcome, but the bugs could turn out not to be relevant to us.

The rc1

Overall: most of the kernel changes are now merged, no new features are allowed to be added, the following period until the major release is expected to fix only regressions

Developers: it's a good time to test extensively, changes in VFS, MM, scheduler, debugging features and other subsystems could introduce bugs or misbehaviour

From now on until the late release candidates, it's a good time to post big patchsets that are supposed to land in the next kernel. There's time to let others to do review, discuss design goals, do patchset revisions based on feedback.

Depending on the proposed changes, the patchset could be queued for the next release within that time. If the patchset is intrusive, it could stay in the for-next branches for some time.

The late rcX (rc5 and up)

Overall: based on past experience, there are at least 5 release candidates, done on a weekly basis, so you can estimate the amount of time before the full release or merge window. The 5 seems like am minimum, usually there are 2 or 3 more release candidates.

Developers: new code for the upcoming kernel is supposed to be reviewed and tested, can be found in the for-next branch

Sending intrusive changes at this point is not guaranteed to be reviewed or testd in time so it gets queued for the next kernel. This highly depends on the nature of the changes. Patch count should not be an issue if the patches are revieweable or do not do intrusive changes.

Major release

goto 1;

Development phase, linux-next, for-next

Patches and patchsets that are supposed to be merged in the next merge cycle are usually collected in the linux-next git tree. This gives an overview about potential conflicts and provides a central point for testing various patches. The btrfs patches for linux-next tree are hosted at https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git in branch for-next. The update period is irregular, usually a few times per week.

Patches are added to for-next when they get a basic review and do not seriously decrease stability. Some level of breakage is allowed and inevitable so there's a possibility to get a tree for early testing. Also there are external services that provide compilation coverage for various arches and configurations.

The for-next branch is rebased and rebuilt from scratch and cannot be used as base for patch development. Independent patches should use last -rc tag.

Misc information and hints

This section collects random pieces of advice from mailinglist that are given to newcomers.

How to get started - development

  • Build and install the latest kernel from Linus's git repo.
  • Read and understand the user documentation (Main_Page#Guides_and_usage_information).
  • Create one or several btrfs filesystems with different configurations and learn how they work in userspace -- what are the features, what are the problems you see? Actually use at least one of the filesystems you created for real data in daily use (with backups)
  • Build the userspace tools from git
  • Pick up one of the userspace projects from Project_ideas#Userspace_tools_projects and implement that. If you pick the right one(s), you'll have to learn about some of the internal structures of the FS anyway. Compile and test your patch. If you're adding a new feature, write an automated xfstest for it as well.
  • Get that patch accepted. This will probably involve a sequence of revisions to it, multiple versions over a period of several weeks or more, with a review process. You should also send your test to xfstests and get that accepted.
  • Do the above again, until you get used to the processes involved, and have demonstrated that you can work well with the other people in the subsystem, and are generally producing useful and sane code. It's all about trust -- can you be trusted to mostly do the right thing?
  • Use the documentation at Main_Page#Developer_documentation, and the output of btrfs-debug-tree to understand the internal structure of the FS
  • Pick up one of the smaller, more self-contained ideas from the projects page Project_ideas (say, Project_ideas#Cancellable_operations or Project_ideas#Implement_new_FALLOC_FL_.2A_modes) and try to implement it. Again: build, write test code, test thoroughly, submit patch for review, modify as suggested by reviewers, and repeat as often as necessary.

How not to start

It might be tempting to look for coding style violations and send patches to fix them. This happens from time to time and the community does not welcome that. The following text reflects our stance:

(copied from http://article.gmane.org/gmane.comp.file-systems.btrfs/53421)

If you want to contribute and do something useful for others and yourself, just don't keep sending these patches to fix whitespace/style issues reported by checkpatch. Think about it:

  1. You don't learn anything by doing them. You don't learn nothing about btrfs internals, filesystems in general, kernel programming in general, general programming in C, etc. It ends up being only a waste of time for you;
  2. You're not offering any benefit to users - not fixing a bug, not adding a new feature, not doing any performance/efficiency improvement, not making the code more reliable, etc;
  3. You're not offering a benefit for other developers either, like doing a cleanup that simplifies a complex algorithm for example.

If you care so much about the whitespace/style issues, just fix them while doing a useful change as mentioned above that happens to touch the same code. It takes time to read and understand code, it can be a big investment of time, but it ends up being worth it. There's plenty of bug reports and performance issues in the mailing list or bugzilla, so there's no shortage of things to do.

Same advice applies to any other kernel subsystem or open source project in general. Also before jumping into such a storm of useless patches, observe first what a community does for at least a month, and learn from other contributors - what they do, how they do it, the flow of development and releases, etc. Don't rush into a sending patch just for the sake of sending it and having your name in the git history.

References

Personal tools