Interrupt

Using Asserts in Embedded Systems | Interrupt

The use of asserts is one of the best ways to find bugs, unintended behavior, programmatic errors, and to catch when systems are no longer 100% functional and need to be reset to recover. If instrumented correctly, an assert can give a developer context about when and where in the code an issue took place. Despite the numerous benefits, the practice of using asserts in firmware is not common or agreed upon.


This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/asserts-in-embedded-systems

I did some similar observations some years ago (especially related to C++) and came to the conclusion, that using assert() can come with a very small overhead if done right (as you, by just using the PC): http://robitzki.de/blog/Assert_Hash

1 Like

Yes! I loved your article. I stumbled across it while researching for this blog post, and I especially loved the code size table (I hope you don’t mind I was inspired by it).

I did come across another post, https://barrgroup.com/Embedded-Systems/How-To/Define-Assert-Macro, which goes to great lengths to make an even smaller assert, encoding the file, task, line number, and version, all into a single 32 bit address. I felt it was overkill and the tradeoffs weren’t worth it, especially if a system already had a bit of logging set up or coredumps.

With the hashing strategy of the file, line number, did you ever run into collisions?

With the hashing strategy of the file, line number, did you ever run into collisions?

I’ve actually never used it. Meanwhile, I came to the conclusion, that just using the program counter is sufficient to identify the code location. In case of a hardware exception, you just have that information. So just using the PC somehow unifies crash handling code. Of cause, you have to have the binaries and you have to know, what version of your firmware failed, but I think that are requirements that are not that hard to fulfill.

I’ve actually never used it.

Cool, got it. Awesome that the optimal solution worked for you!

that just using the program counter is sufficient to identify the code location

It’s enough to know where it crashed, but having another frame before or multiple with coredumps is even better.

you have to know, what version of your firmware failed

We’ve used GNU Build ID to mark our builds in the past, stored those in a simple key-value style blob storage, and were able to retrieve them on demand when we needed to. Quite simple, and far better than trying to made builds by semantic versioning and build flags.

1 Like

Making the Most of Asserts

I record the backtrace and the precise program counter. That’s it. Nothing more. OK. Also a time stamp can be useful and maybe a couple of uintptr_t words the programmer can add to help debug.

Care needs to be taken as the optimizer will sweep all common code into one call to the assert utility, then you don’t know which of several asserts in a function fired! We ended up going with a gcc asm oneliner to get the precise PC.

https://www.gnu.org/software/libc/manual/html_node/Backtraces.html

My biggest problem with code that checks for malloc returning null and attempting to handle it…

…usually it is untested, buggy, and somewhere along the line uses malloc to do it’s job! (Guess what lurks in the depths of a printf?)

The next problem on a system with swap… these days your system is effectively dead/totally dysfunctional loong before malloc returns NULL!

The light weight IP stack uses pool allocators with quite small pools for resources that may have (potentially malicious) spikes in usage. But then you will find all over it the attitude “this is an IP packet, somethings wrong / I can handle it / I don’t know enough / I don’t have enough resources / …” I’ll just drop the packet. If it matters the higher layers will retry.

Another good pattern is to malloc everything you need for this configuration at initialization time … at least then you know then and there that that configuration will work… if you can’t, you reboot to a safe configuration.

When Not to Assert

Never assert on invalid input from users or external untrusted systems. If you do, you open yourself to denial of service attacks (and pissed off users).

Design by Contract

Please read and understand https://en.wikipedia.org/wiki/Design_by_contract

I regard DbC as the most important concepts in producing correct software, and has a lot to say about asserts.

Hey,
Nice article Tyler! Always clear and useful :slight_smile:

Regarding the asserts during the boot up sequence, I added a flag that is cleared when the system is fully running. If an assert fails while booting, I reboot into Bootloader mode.

Thanks for the note Cyril, it’s a smart way to make sure you don’t shoot yourself in the foot. We’ll add a note about it on the post (cc @tyler).