Great post François. I used to dread writing bootloaders until I made the effort to really understand the ARM early boot process, and now they’re pretty straight-forward.
If you’re using CMSIS you (technically) don’t need assembler to branch to application code. Here’s a snippet of some code showing how my bootloaders launch application code, modified to use your linker variables.
// Reset stack and vector table
uint32_t *stack_ptr = (uint32_t *)__approm_start__;
__disable_irq(); // Good idea when messing with MSP/VTOR
__set_CONTROL(0);
__set_MSP(*stack_ptr);
SCB->VTOR = __approm_start__;
// Call the reset handler (the construct below is 'pointer to a pointer to a function that takes no arguments and returns void')
void (**reset_handler)(void) = (void(**)(void))(__approm_start__ + 4);
(*reset_handler)(); // Call it as if it were a C function
Although your ASM version may be smaller, this is an alternative that boils down to pretty much the same instructions.
Some advice for others from working with bootloaders in the field: always have a last resort way to ‘break into’ the bootloader by, for example, holding a button down, or checking for a voltage on a pin. My bootloaders typically checksum the application and if it validates, immediately execute the code. If you don’t have a way to bypass this, and force the bootloader to remain active for new new code to be loaded, you may need to attach a debugger to recover from buggy firmware.
As well as a manual override to remain in the bootloader for loading new code, you also want the bootloader to automatically detect that it should not launch any existing code but instead wait for new code to be loaded. For a while I used a technique similar to your ‘Message passing to catch reboot loops’ code, putting magic numbers into RAM that is not cleared during a restart. These days most ARM vendors provide a ‘reset control’ register that reports the reason for the last reset. One of the MCUs I use, for example, reports whether the system was reset because of low voltage, watchdog, lockup, reset pin (debugger attached), or software (request via the NVIC). If the bootloader detects a software reset, it will not launch any existing app and instead wait for new code. This means no more messing around with application linker scripts to reserve memory for communicating with the bootloader. It could also perform some specific actions based on continued watchdog resets, as suggested in your article.
Lastly, my favourite user interface to a bootloader has to be an emulated USB mass-storage device. When the bootloader is waiting for new code, it brings up a USB stack and emulates a hard drive that you simply drag and drop the new application image onto, and the bootloader validates the image and programs it into flash. This is a serious undertaking! USB stacks are usually pretty large and MSD emulation is complex, so only recommended for fearless developers targeting chips with lots of flash. But when it works, it is pure magic!
Thanks for the comment, @simonhaines. It adds a ton to the conversation.
You make a great point that the bootloader may want to do something different based on the reset type. Some implementations also provide some RTC scratch registers (e.g. STM32) which can be used to communicate between the app & the bootloader without the linker complexity.
On the topic of setting the MSP, Keil makes this even nicer using named register variables. I’ll reproduce their example below:
register unsigned int _control __asm("control");
register unsigned int _msp __asm("msp");
register unsigned int _psp __asm("psp");
void init(void)
{
_msp = 0x30000000; // set up Main Stack Pointer
_control = _control | 3; // switch to User Mode with Process Stack
_psp = 0x40000000; // set up Process Stack Pointer
}
Thank you francois for your very good job ! I am starting in programation and stm32 and i thank it will very help me to build my own bootloader. Yet, may your help me to build a sucure bootloader for stm32f103x8 from this repository linked (i have many error) ? Thank you : https://github.com/dmitrystu/sboot_stm32
Good question! Yes, it is deterministic. The way arguments are mapped to registers is part of the ARM Procedure Call Standard that a compiler needs to follow for ARM. You can find all the details in Section 5.5 “Parameter Passing” but the gist is
The base standard provides for passing arguments in core registers (r0-r3) and on the stack. For subroutines that take a small number of parameters, only registers are used, greatly reducing the overhead of a call
GCC does support an assembly variant, referred to as extended-asm which lets you mix C variables with assembly. While it’s not necessary to use in this case, it can be helpful when you want to reference something that isn’t in a known register from some assembly code (such as a local variable)
Hey, I’ve really enjoyed the Zero to main() series thus far. Unfortunately I think they are a bit too unspecific at times, so it can kind of be hard to follow along. This is especially true when trying to implement the examples locally. So my question is, is there any external reading material or lectures that kind of explains everything in this series more in depth?
Not a boot-strapper but a boot-loader to update code via application code via UART, USB, SD card, Memory Stick, Ethernet web server, Modbus, I2C (and so on). Supports optional encryption (including AES256) and two step boot loader to allow the serial loader and also the application to be updated. Available for various processors (including NXP Kinetis, LPC, STM32, and others) - also in open source form on GIT hub (Search uTasker there)
Docs:
AES256 encrypted UART and USB-HID loading: *********************
Building and checking bare-minimum loader and application with simulation: **************
The project has custom tools to do typical things like combining loader and application, encrypting the application for loading, etc.- *********************
Uses about all the techniques discussed in the blog and works with many IDEs so can be freely used by anyone with interest in loader capability. So enjoy.
Also supported for professional users, with a history of 15 years use in many real products (millions of units with no reported issues).
Regards
Mark (developer of uTasker project since 2004)
P.S. **************** are links that I removed since I can’t use more that two in my first post (I can add then in other posts on request)
hallo i have read your post on bootloader it was really a great post. I am new to firmware development and bootloader .As a part of my project i have to developed a bootloader for firmware updation .Atmel samd21 19a i have to do firmware updation using usb so first i have to check whether ther is data in usb buffer if so i should store it in a buffer which is then given to xml parser and then encrypted to base 64
but i stucked with first part i am not getting data from my usb buffer
if(usb_rx_buf != 0) //checking if data is available
memcpy(xmlBuf,usb_rx_buf,usb_rx_buf+1); //copying Usb data to xml Buffer
if(usb_rv_index >= sizeof(usb_rx_buf) )
usb_rv_index ++;
xmlReq = XMLparser_parse_str((uint8_t*)xmlBuf, &XMLParameter);
clearXmlBuffer = 1;
am not sure whether problem is with code please help me to find the fault and proceed
Sorry @ryan1, there’s not quite enough info in your post to help. Also, note that base64 is not encryption but encoding, please don’t use it to hide sensitive information!
First thanks for great post. I have below question:
In session “Relocating our app from flash to RAM”, I believe this is not mandatory step. We can execute code from approm something like below. Correct ?
Regarding “shared.h” text code, I believe we will have 2 copies of same code( .text) in bootrom and approm. Was thinking that Is there any way to make only 1 copy. Is this possible?
During “start_app” definition you have added keyword "attribute((naked)) ".Can you please tell reason of adding? If we don’t add, can it create issue?
I have one suggestion, Please make new blog on updating firmware with bootloader (may be dual bank application). As a hobby project I built my bootloader to update application so looking to know best practice. Thanks
Also facing one strange issue, I am getting linker script variables value completely wrong. For example for variable _shared_data_start, I am getting its value as “0x4b04d002” but this is not correct address. Any suggestion. I am using IDE STM32 workbench
You are correct. I wanted to cover this technique, but it is not required or even appropriate for every project. Reasons you might want to relocate to RAM are:
It might be faster than ROM on some chips
Not every MCU supports executing from flash
You are correct, you end up with two copies of that code. It is technically possible to use a single copy of the code shared between the app and the bootloader, but it is non trivial. In the past I have done this to avoid having two copies of libc (in bootloader & app). The trick is to link your app against your bootloader. Perhaps we will cover this in a future post.
You can find documentation for the naked attribute here. This attribute is used to tell the compiler it should not add a function prologue / epilogue. It is recommended when a function contains only inline assembly. In this case, omitting it likely would not cause problems.
I am not familiar with the STM32 workbench, but remember that you should be taking a reference to the variable. I would expect &_shared_data_start to be 0x20000300, but if you remove the & you won’t get the expected value.
That’s correct, you could do this just was with with a section attribute (in fact, this is what I do in my latest post). Here, I chose to use a linker script variable in instead:
Thanks a lot for this insightful article. It helped me a lot to feel comfortable with programming code that is executed from RAM.
My very short programming experience is very related to the subject of this article. I wrote a bootloader for STM32 MCU that supports FOTA (Firmware Update Over The Air).
My boot loader received relocatable binary as input, and had to change its content according to the a destination address.
The destination address is toggled by the bootloader. This architecture allows keep running the previous application, in case a new application FOTA procedure fails (e.g. due to sudden communication disconnection).
In the essence of my bootloader there were two points worth mentioning:
In order to really relocate a relocatable code, the Bootloader has to modify the NVIC values by an offset which is equal to the offset between the compiled binary address and the destination address.
The reset handler can and must be written in C rather than in assembly, other wise it can’t be compiled to relocatable code.
Both points apply only for the case that the run address is not known at the time of of compilation.
I found that with high optimization levels, e.g. -O2 or -O3, it can lead to incorrect
behavior, so we explicitly reference sp, pc parameter names in the inline asm:
As far as I could tell, and this may have been with an older arm-none-eabi-gcc toolchain (6.3.1?), the compiler inferred that the function didn’t USE its parameters by name (the way you have it coded), so didn’t populate r0 and r1 at the call site! Hence when the function was called, some random r0 and r1 were written into msp and pc, and chaos ensued.
François, thanks for the great post!
I have few questions, sorry if some are obvious, I’m not that mature in embedding programming yet.
startup_samd21.c file contains branch to main. when the call to main, and which main (there are 2 of them in app and bootloader) is happened? I understand this file contains functions which at compile and link stage are used in places they are called. But how it happens this file call main then?
2.Same file contains exception_table with number of fields for each interrupt .pfnReset_Handler
** .pfnNMI_Handler **
** .pfnHardFault_Handler **
** .pvReservedM12 **
** .pvReservedM11**
…
when (from where) are they further called? is the syntax for the field names governed by some standard?
This question might be naive, sorry for that. same file has const DeviceVectors exception_table =
to my understanding there might be some structure with typedef struct DeviceVectors declaration. But where particularly is it?
Reset first happens at the start up of the device, and Reset_Handler should be called and transfer execution to app (because it remaps NVIC and re-write PC). How then we can load bootloader? Why at startup Reset_Handler from startup_samd21.c file is not called and bootloader is still loaded?
Why we remap NVIC for the app? Why don’t to use the original one?
startup_samd21.c file contains branch to main . when the call to main , and which main (there are 2 of them in app and bootloader) is happened? I understand this file contains functions which at compile and link stage are used in places they are called. But how it happens this file call main then?
Each of the programs we compile (i.e. the app and the bootloader) contains only a single instance of main. Otherwise, the linker would complain! Which main is compiled in a given program is specified in the Makefile.
when (from where) are they further called? is the syntax for the field names governed by some standard?
The functions in the vector table are called by the hardware! You won’t find a single call to them in software. The ARMv7m spec defines where the exception table should be found, and the hardware will jump to those addresses when an exception/interrupt happens.
Why we remap NVIC for the app? Why don’t to use the original one?
When we boot, the exception table points to the bootloader’s exception handlers. When we start the app, we want to use the app’s exception handles instead, so we have to remap it using the VTOR register.
Reset first happens at the start up of the device, and Reset_Handler should be called and transfer execution to app (because it remaps NVIC and re-write PC). How then we can load bootloader? Why at startup Reset_Handler from startup_samd21.c file is not called and bootloader is still loaded?
Both the app and the bootloader have an exception table and a Reset_Handler. We write the bootloader at start of flash, so by default its exception table is used (until we remap it). Which binary is found where is defined by the Makefile and the linker scripts.
Francois thanks for your prompt reply. I still have some confusion, would appreciate if you can clarify. Before posting I google all the topics so will be precise.
file line 214. In this project you compile both app and bootloader (at the same time, right?). Both has main function. Why line 214 calls app’s main, not bootloader’s main?
Tiny question.
you and many other projects use attribute ((section(“.vectors”))) before table assignment, though documentation states to use it in between e.g. struct duart a attribute ((section (“DUART_A”))) = { 0 };.
Is there any new spec which covers such usage?
I mostly understand, but still not clear why we need own vector table (not 2, right? I mean we have all addresses for bootloader and app interrupts but only single vector table exists for controller) for app and bootloader. May be you can give simple example? The post you mentioned somewhere in your blog provide some explanation
So long as the user program doesn’t go and mess with the VTOR, any interrupts that occur after the user program re-enables interrupts will cause the NVIC to use the user program’s table to determine where the handlers are. Isn’t that awesome?
They are built at the same time, but into two different programs! Two different .elf files are compiled, each with a single main function.
They both work.
We have two vector tables. One vector table for the bootloader at the default address, and one vector table for the app at a different address. This is because the bootloader and the app might handle exceptions and interrupts differently! When we start the app for the bootloader we write VTOR to change which vector table is in use. When the chip resets it goes back to the default address, i.e. the bootloader.
I’ve a simple question regarding a potential development of OTA procedure using a telecommunication module.
I’m thinking of a bootloader that can copy the received application code into the current application code.
But as a fail scenarios, the complete binary image won’t be received. So the idea is similar to ESP8266 OTA procedure: Store the receiving binary into a certain area in flash memory (not the active app_code), after it completes, the next start would replace the old image with the newer one.
To what extent this procedure is hard? What should I consider?
Any recommendation for such a thing?
And as the project hasn’t launched yet, Is there a bootloader-based recommendation for a Low power MCU(s)?
If you have plenty of flash to spare, you can design a system that uses a full flash region to stage an update. It isn’t strictly necessary however.
If you’re looking for an off-the-shelf bootloader, I would recommend https://www.mcuboot.com/. While I don’t think it is perfect, it is a robust open source solution.
Thank you for your guidance, I’ll read the suggested topics, beside the first two parts of this (I’ve landed here via this link). Then I’ll get back with further questions if so.
Question: I see in your post that the bootloader jumps to the application by calling the reset handler of the application and then it is the responsibility of the application reset handler to update the VTOR register. In other replies to this post it seems that some solutions update the VTOR register prior to jumping to the application. Is there a reason to do it one way or the other? Does the VTOR register not get cleared upon reset?
You are correct that’s a chip reset would clear VTOR, but calling the Reset_Handler of the application does not cause the chip to reset! In other words, when you go from bootloader to app, VTOR does not change unless you explicitly set it.
Whether you set VTOR before jumping to the app (i.e. in the bootloader code), or after (i.e. in the app code) is really up to you.
Locking the bootloader with the MPU is a great idea, but I don’t think SAMD21 (used for the examples above that) has an MPU. The MPU is optional on cortex M0+ and based on the datasheet it looks like microchip didn’t include it.