GNU Build IDs for Firmware

In this post, we demonstrate how to use the GNU Build ID to uniquely identify a build. We explain what the GNU build ID is, how it is enabled, and how it is used in a firmware context.


This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/gnu-build-id-for-firmware

Calculation of description offset (code above) is somewhat naive and happens to work because name used by GNU occupies 4 bytes and has no padding, but it’s a bit of luck.
Spec says that name field may contain padding up to 4-byte boundary, which IS NOT included in namesz field, see examples in the spec.
A little bit more generic and change-resistant calculation should look like this:

const size_t name_padding_bits = sizeof(uint32_t) - 1U;
size_t desc_offset = (g_note_build_id.namesz + name_padding_bits) & ~name_padding_bits;
const uint8_t *build_id_desc = &g_note_build_id.data[desc_offset];

Thanks for the note! You are correct, I’ll add a note to the post.

Great write up, just what I was looking for in an upcoming project that I am setting up!
I noticed that for my project, even though I added the .gnu_build_id section after the .text section in the linker script, I was seeing that build-id was being allocated at 0x0.

 Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .note.gnu.build-id 00000024  00000000  00000000  00010000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         0000cec4  00000030  00000030  00010030  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .data         0000012c  20000000  0000cef4  00020000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          000037c5  2000012c  0000d020  0002012c  2**2
                  ALLOC

I had to add KEEP(*(.note.gnu.build-id)) before PROVIDE(g_note_build_id = .);. So my final linker section looked like this.

 .gnu_build_id :
 {
    KEEP(*(.note.gnu.build-id))
    PROVIDE(g_note_build_id = .);
    *(.note.gnu_build_id)
    _etext = .;
 } > FLASH

Once I did this, my allocation was as expected.

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000cec4  00000000  00000000  00010000  2**4
              CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .gnu_build_id 00000024  0000cec4  0000cec4  0001cec4  2**2
              CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         0000012c  20000000  0000cee8  00020000  2**2
              CONTENTS, ALLOC, LOAD, DATA`

Hope this helps others.

Software Versions:

arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 
(release) [ARM/arm-9-branch revision 277599]

GNU ld (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
Copyright (C) 2019 Free Software Foundation, Inc.

Great write up, just what I was looking for in an upcoming project that I am setting up!

Awesome, thanks for the feedback, glad to hear you found the article useful!

I think you bumped into an issue because there is a tiny typo above. After making this change you should be able to remove the KEEP()

-*(.note.gnu_build_id)
+*(.note.gnu_build-id)

Duh! Good catch Chris. Works now.

Thanks for the article. I am curious how to integrate this into my workflow. Suppose I am debugging in the field and can read the GNU Build ID over UART, how would I look up which commit this GNU Build ID corresponds to? Assuming that this commit is from one of my CI builds. I supposed I need some way to tag each CI build with the GNU Build ID and have a system where I can do a reverse look up?

If you are after just the mapping, you could just store the commit hash and this information above in the binary itself. It might making things easier and eliminate the need to do the translation.

But, if you’d rather not, that’s pretty much what you need if you want to eliminate storing the commit SHA in the binary. What I’ve done in the past at previous companies is to create a very simple data store (MongoDB, PostgreSQL, Artifactory Tags, anything really):

  • GNU Build ID
  • Git commit SHA
  • Build variation (release, debug, factory, etc)
  • Semantic Version of build (1.2.3-alpha2)
  • Build Date
  • Jenkins build number

If all of this is stored in a central location with a basic API surrounding it , it should be pretty easy to push/pull any data you may need.

Lastly, if you are doing this translation primarily to get the ELF file to do address look-ups or debugging, I suggest going with the latter approach so that you can easily script the GNU Build ID → Jenkins Build → Artifact ELF process.

Quick question as I think about ways I could incorporate this feature - is this primarily used for debugging purposes or can it also be used to identify if a file came from a trusted source. Or may be image signing is a totally different ball game?

The GNU build ID can really only be used to match a binary to a build. It is not a secure signature, and should not be used to check for authenticity. A malicious actor could trivially generate a valid build ID for a firmware image they created.

We will write about firmware signing in a future post. What you’ll typically need there is asymmetric crypto, where you’ll have a private key kept secret by your company, and a public key baked into your bootloader. During builds, you sign the firmware with the private key, and when the device starts, the bootloader verifies the firmware with the public key.

1 Like

Hi!

I wonder how to get the reproducible build ID and at the same time have the timestamp in our binaries.
The timestamp shows it’s usefulness in checking if the firmware being loaded and debugged at the moment is really the fresh one - SemVer, VCS revision and other versioning info do not provide this.

So I am looking at how to NOT eliminate timestamp from our workflow and binaries.

On the other hand I DO wish to have build IDs being the same for two binaries compiled from same sources on different machines or at the same machine at different times. I have managed to get reproducible builds for different machines and even different OSes (toolchain is the same and all the paths for ASSERT, etc. are relative). The only different part is timestamp.

Timestamp (and some other identification info) is being put into variables that are placed in .text section via linker script, thus being parsed by build-id and affecting the resulting hash.

Is there a way to put some variables into such a section so the build-id will ignore the section and generate the same hash value for two binaries with different timestamps?
Our timestamp is not the one being linked by linker with the flag.
It’s the TIMESTAMP := $(shell echo $$(date +"%s")) command in makefile.

Just for the sake of argument, maybe timestamp itself could be used as identifier? Being changed at every second, it is most probably will be unique for each and every binary being put in the filed - kinda what build-id strives to offer.

I wonder how to get the reproducible build ID and at the same time have the timestamp in our binaries.

@alvangee sorry for the delay on the response. This depends a bit on your use case, but the simplest way to accomplish what you want is to add the timestamp as a post-processing step on the BIN file (i.e. after it’s been linked and objcopy-ed). We used a similar technique to add a build signature in this article: Secure firmware updates with code signing | Interrupt.