Automatically format and lint code with pre-commit | Interrupt

I love clean and tidy codebases. I love them even more if I don’t have to manually spend hours doing it myself. The problem is not everyone shares the same love and passion for tidy codebases, especially when we’re pressed for time trying to get a new firmware build released.


This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/pre-commit

Great post. Very clear and a great sample of starting points to up your CI game! Great to see attention paid to the idiosyncrasies of firmware.

I realise that auto-format is so prevalent in the web-dev space as to be largely beyond why and largely now concerned with how. But it’s always seemed so fundamentally concerning and I’ve never got any traction when I raise those concerns. This seems like a great forum to discuss. I figure web-dev is different enough from embedded-dev that the concerns are outweighed by the “stop the PR style-war pain” north star that seems to drive much of web-dev, and for that matter, distributed, open dev in general.

But posts like these show that we’re going to see these ideas merge with embedded-dev eventually, and I wonder if we can be judicious in how that merge occurs? Generally, I’m extremely excited about bring web-dev innovation to embedded-dev, but much more excited about cherry-picking and doing it empathetically, lest we bog things down instead of accelerating them.

With that framing out of the way, my concern is that formatting (by which I mean use of whitespace) is a significant factor in the maintainability of code. We really only have three levers when it comes to writing readable code: structure, naming and formatting. Are we really prepared to give up one of those?

In firmware, perhaps uniquely, whitespace is an extremely powerful tool. Let me demonstrate with some random samples from code I have handy.

Screen Shot 2021-10-28 at 3.29.03 pm

Now imagine those code snippets after auto-formatting, with all the programmer-supplied whitespace removed.

The point of including so many examples is to demonstrate that this is very much not an auto-thing, it’s a programmer-thinking thing. Many of these examples highlight cases where the “usual” convention was broken to aid readability instead. Using robots to format code seems to make as much sense as asking humans to pick passwords with lots of symbols and mixed case. Great for robots, not so good for humans.

And just like passwords, the problem to be solved remains (encouraging hard to hack passwords), but the right solution might not be the obvious one (use a memorable phrase instead).

So you never see final state of your code with changes your pre-commit tool made before it commited into repo?

No, by default if pre-commit makes changes it will reject the commit so you can review the changes. The well-trained monkey at the keyboard then learns to commit again.

The idea of pre-commit is to ensure you’ve formatted and linted as per the project’s requirements before committing. It doesn’t prevent you running those tools as often as you like before committing, but it prevents those “forgetful” developers from committing non-compliant code.

See this for a discussion on how humans tend to learn to work around these obstacles.

While you are right that there is some formatting that is best done by humans, the vast majority of it is handled perfectly well by computers.

For example, making sure there’s the right amount of spaces to indent a block.

So how do you apply it selectively? The flags that turns these things off and on work on a line by line basis, which is not the level of resolution you’re referring to. The article mentions clang-format. It doesn’t allow mixed formatting, does it?

Indenting a block correctly seems like a trivial reason to auto-format a source file. Between text editors and the person at the keyboard responsible for creating logical things, committing incorrect indent would seem easy to avoid.

Excellent!
I suppose GUI git clients do the same (commit fails and changes shown?)

Excellent article.

  1. get your C/C++ sources compiling with Clang

Easier said than done! :slight_smile:
That said, this is another good argument for a command-line-centric build.

1 Like

This is a good point! What I’ve done in the past is insert the // clang-format off and on fences around blocks that I want to preserve from clang-format (your examples are great BTW!):

https://clang.llvm.org/docs/ClangFormatStyleOptions.html#disabling-formatting-on-a-piece-of-code

Bit ugly but usually rare when it’s needed.

It seems awesome, thanks for sharing.
I’m a windows user, and this is my .pre-commit-config.yaml file:

files: "^\
  (Src/.*)|(Inc/.*)\
  "

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.0.1
    hooks:
      - id: check-added-large-files # prevents giant files from being committed.
      - id: check-case-conflict # checks for files that would conflict in case-insensitive filesystems.
      - id: check-merge-conflict # checks for files that contain merge conflict strings.
      - id: check-yaml # checks yaml files for parseable syntax.
      - id: detect-private-key # detects the presence of private keys.
      - id: end-of-file-fixer # ensures that a file is either empty, or ends with one newline.
      - id: fix-byte-order-marker # removes utf-8 byte order marker.
      - id: mixed-line-ending # replaces or checks mixed line ending.
      - id: requirements-txt-fixer # sorts entries in requirements.txt.
      - id: trailing-whitespace # trims trailing whitespace.

  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v13.0.0
    hooks:
      - id: clang-format

I need to just check inside Src and Inc folders. but it seems that the files: part is not valid.
would you please help me to fix it🙏
I also want to make clang-format use my customized C style. should I do sth here or I have to create the .clang-format file inside the project folder?

Looks like you need to have a pre-commit version 1.21.0+ for that files: key to work:
https://pre-commit.com/#top_level-files

I tested it here and it seems to be operating correctly (I added some files with violations, so the action correctly is failing): add readme · noahp/test-precommit-file-key@891ba35 · GitHub

For clang-format, I’ve updated the article with some hints (see PR here), but it’s probably best to add a .clang-format file to the repo base- that way running clang-format outside of pre-commit will have the same behavior (the alternative would be to list out the options in the hook, but then it would only impact pre-commit's invocation of clang-format).

You can generate a config with clang-format --style=Google --dump-config > .clang-format, but note that that will output ALL the options set by that config; you might want to start with a BasedOnStyle and override only the options you need, to keep the .clang-format file simpler (Clang-Format Style Options — Clang 13 documentation)

I know I’m super late to this party, but I recommend taking a look at the clang-format style options, because for all of your examples there are options that can get you a) exactly what you want (with some manual formatting required, e.g. for internal spacing alignment in some of the examples - but it won’t be overwritten), or b) a suitably close output. The web documentation is nice because it gives output examples for the various options, making it much easier to grok.

Going in order of the images provided, here are the appropriate options that enable the desired output. Note that none require you to disable formatting to get the same or a suitably close output!

  • AlignConsecutiveAssignments + AlignTrailingComments (the separation between elements in the enum is fine)
  • AlignTrailingComments (extra newline before max is fine)
  • AlignConsecutiveDeclarations (spacing is fine, you have the option for whether the broken spacing causes the alignment to reset or not)
  • AlignConsecutiveMacros (can work across empty lines and comments)
  • AlignEscapedNewlines
  • AlignOperands
  • AlignTrailingComments (the two values per line doesn’t cause trouble)
  • AlignConsecutiveAssignments
  • AlignArrayOfStructures
  • AlignOperands + AlignTrailingComments
    • The if parentheses locations might be auto-formatted differently depending on other options, but I don’t really think that impacts readability at all.
  • AlignConsecutiveAssignments
  • AlignConsecutiveAssignments, and then the spacing following that is able to be manually specified and not overridden (unless you turn on some option that changes that).
  • AlignOperands, though they may be moved closer in by the formatter. The first line may be moved up to the opening paren - personally, a non-issue to me. Might be an option to change that behavior though.
  • AllowShortCaseLabelsOnASingleLine + AlignTrailingComments
  • AlignConsecutiveAssignments, AlignTrailingComments, AlignOperands - the internal spacing is fine and won’t be formatted (unless you turn on some rule that changes that)

Now, are there cases where you might specify an option that causes conflicts with the above? Yeah, totally. but there are always sane options to take, such as a) locally disabling the formatting check, b) just being ok with it, or c) just adjusting how you specify things. For example, in the image where UINT16_MAX is doubled up on some lines - if there is an option that forces all values on different lines, you could simply change how the array is commented, e.g. where combined values are not two per line but instead commented as mbir_JP3_Fault_Code_MSB and mbir_JP3_Fault_Code_LSB (my own preference since the interpretation is clearer).

1 Like

Fascinating, thank you for taking the time so carefully!

Seems like the modern software practice of introducing technology that removes capabilities, and then gradually adding complexity to restore some of that functionality and calling it progress. I’m really impressed these options exist, but it does look like a nightmare to configure them.

Fundamentally I’m concerned that we’re being asked to satisfy a robot when our audience is actually humans (the machines don’t care what it looks like). Satisfying a robot is always going to require expending effort that only results in a compromise. This very impressive set of options only seems to ratchet that up a little.

But you’ve definitely given me a big push to give it a go. I’m very encouraged that clearly a lot of thought has been given to the value of whitespace. That alone gives me some faith that we haven’t entirely thrown the baby out with the bath water, and maybe the advantages that auto-formatting bring could outweigh the drawbacks it brings. On the other hand I’m concerned that few people seem to care, so thanks for your interest at least!