Zero to main(): How to Write a Bootloader from Scratch | Interrupt

Thanks a lot for this insightful article. It helped me a lot to feel comfortable with programming code that is executed from RAM.
My very short programming experience is very related to the subject of this article. I wrote a bootloader for STM32 MCU that supports FOTA (Firmware Update Over The Air).
My boot loader received relocatable binary as input, and had to change its content according to the a destination address.
The destination address is toggled by the bootloader. This architecture allows keep running the previous application, in case a new application FOTA procedure fails (e.g. due to sudden communication disconnection).
In the essence of my bootloader there were two points worth mentioning:

  1. In order to really relocate a relocatable code, the Bootloader has to modify the NVIC values by an offset which is equal to the offset between the compiled binary address and the destination address.
  2. The reset handler can and must be written in C rather than in assembly, other wise it can’t be compiled to relocatable code.
    Both points apply only for the case that the run address is not known at the time of of compilation.
1 Like

I found that with high optimization levels, e.g. -O2 or -O3, it can lead to incorrect
behavior, so we explicitly reference sp, pc parameter names in the inline asm:

void BOOT_jump(uint32_t sp, uint32_t pc)
  __asm 
        (
         "msr msp, %[sp]\n"
         "msr psp, %[sp]\n"
         "mov pc, %[pc]" : : [sp] "r" (sp), [pc] "r" (pc)
         );
}

As far as I could tell, and this may have been with an older arm-none-eabi-gcc toolchain (6.3.1?), the compiler inferred that the function didn’t USE its parameters by name (the way you have it coded), so didn’t populate r0 and r1 at the call site! Hence when the function was called, some random r0 and r1 were written into msp and pc, and chaos ensued.

François, thanks for the great post!
I have few questions, sorry if some are obvious, I’m not that mature in embedding programming yet.

  1. startup_samd21.c file contains branch to main. when the call to main, and which main (there are 2 of them in app and bootloader) is happened? I understand this file contains functions which at compile and link stage are used in places they are called. But how it happens this file call main then?

2.Same file contains exception_table with number of fields for each interrupt
.pfnReset_Handler
** .pfnNMI_Handler **
** .pfnHardFault_Handler **
** .pvReservedM12 **
** .pvReservedM11**
…
when (from where) are they further called? is the syntax for the field names governed by some standard?

  1. This question might be naive, sorry for that. same file has
    const DeviceVectors exception_table =
    to my understanding there might be some structure with typedef struct DeviceVectors declaration. But where particularly is it?
  2. Reset first happens at the start up of the device, and Reset_Handler should be called and transfer execution to app (because it remaps NVIC and re-write PC). How then we can load bootloader? Why at startup Reset_Handler from startup_samd21.c file is not called and bootloader is still loaded?
  3. Why we remap NVIC for the app? Why don’t to use the original one?

Thanks,
Maxim

Hi @Maxim, and welcome to Interrupt!

  1. startup_samd21.c file contains branch to main . when the call to main , and which main (there are 2 of them in app and bootloader) is happened? I understand this file contains functions which at compile and link stage are used in places they are called. But how it happens this file call main then?

Each of the programs we compile (i.e. the app and the bootloader) contains only a single instance of main. Otherwise, the linker would complain! Which main is compiled in a given program is specified in the Makefile.

when (from where) are they further called? is the syntax for the field names governed by some standard?

The functions in the vector table are called by the hardware! You won’t find a single call to them in software. The ARMv7m spec defines where the exception table should be found, and the hardware will jump to those addresses when an exception/interrupt happens.

  1. Why we remap NVIC for the app? Why don’t to use the original one?

When we boot, the exception table points to the bootloader’s exception handlers. When we start the app, we want to use the app’s exception handles instead, so we have to remap it using the VTOR register.

  1. Reset first happens at the start up of the device, and Reset_Handler should be called and transfer execution to app (because it remaps NVIC and re-write PC). How then we can load bootloader? Why at startup Reset_Handler from startup_samd21.c file is not called and bootloader is still loaded?

Both the app and the bootloader have an exception table and a Reset_Handler. We write the bootloader at start of flash, so by default its exception table is used (until we remap it). Which binary is found where is defined by the Makefile and the linker scripts.

Francois thanks for your prompt reply. I still have some confusion, would appreciate if you can clarify. Before posting I google all the topics so will be precise.

  1. file line 214. In this project you compile both app and bootloader (at the same time, right?). Both has main function. Why line 214 calls app’s main, not bootloader’s main?

  2. Tiny question.
    you and many other projects use attribute ((section(“.vectors”))) before table assignment, though documentation states to use it in between e.g. struct duart a attribute ((section (“DUART_A”))) = { 0 };.
    Is there any new spec which covers such usage?

  3. I mostly understand, but still not clear why we need own vector table (not 2, right? I mean we have all addresses for bootloader and app interrupts but only single vector table exists for controller) for app and bootloader. May be you can give simple example? The post you mentioned somewhere in your blog provide some explanation

So long as the user program doesn’t go and mess with the VTOR, any interrupts that occur after the user program re-enables interrupts will cause the NVIC to use the user program’s table to determine where the handlers are. Isn’t that awesome?

but not sure I understand it right way.

They are built at the same time, but into two different programs! Two different .elf files are compiled, each with a single main function.

They both work.

We have two vector tables. One vector table for the bootloader at the default address, and one vector table for the app at a different address. This is because the bootloader and the app might handle exceptions and interrupts differently! When we start the app for the bootloader we write VTOR to change which vector table is in use. When the chip resets it goes back to the default address, i.e. the bootloader.

Francois, thanks a lot for your replies!

Thank you for your helpful article.

I’ve a simple question regarding a potential development of OTA procedure using a telecommunication module.

I’m thinking of a bootloader that can copy the received application code into the current application code.

But as a fail scenarios, the complete binary image won’t be received. So the idea is similar to ESP8266 OTA procedure: Store the receiving binary into a certain area in flash memory (not the active app_code), after it completes, the next start would replace the old image with the newer one.

To what extent this procedure is hard? What should I consider?
Any recommendation for such a thing?
And as the project hasn’t launched yet, Is there a bootloader-based recommendation for a Low power MCU(s)?

Thank you very much

Hi @HamzaHajeir! First, I suggest you read Device Firmware Update Cookbook | Interrupt. In that article, I propose a design which is robust to OTA failures but does not require a second full flash region.

If you have plenty of flash to spare, you can design a system that uses a full flash region to stage an update. It isn’t strictly necessary however.

If you’re looking for an off-the-shelf bootloader, I would recommend https://www.mcuboot.com/. While I don’t think it is perfect, it is a robust open source solution.

Thank you for your guidance, I’ll read the suggested topics, beside the first two parts of this (I’ve landed here via this link). Then I’ll get back with further questions if so.

Thank you very much @francois.

Francois,

Thanks for the great post.

Question: I see in your post that the bootloader jumps to the application by calling the reset handler of the application and then it is the responsibility of the application reset handler to update the VTOR register. In other replies to this post it seems that some solutions update the VTOR register prior to jumping to the application. Is there a reason to do it one way or the other? Does the VTOR register not get cleared upon reset?

Thanks,
Robert

You are correct that’s a chip reset would clear VTOR, but calling the Reset_Handler of the application does not cause the chip to reset! In other words, when you go from bootloader to app, VTOR does not change unless you explicitly set it.

Whether you set VTOR before jumping to the app (i.e. in the bootloader code), or after (i.e. in the app code) is really up to you.

This is a great post. Wrote my first boatloader with the help of the series ”zero to main”.

@francois

*vtor = ((uint32_t) vector_table & 0xFFFFFFF8);

vector_table is already aligned! why do this ?

Locking the bootloader with the MPU is a great idea, but I don’t think SAMD21 (used for the examples above that) has an MPU. The MPU is optional on cortex M0+ and based on the datasheet it looks like microchip didn’t include it.