Profiling Firmware on Cortex-M | Interrupt

Although microcontrollers in 2020 sometimes come with GHz clock frequencies and even multiple cores, performance is a major concern when developing embedded applications. Indeed, most firmware projects face some real-time constraints which must be carefully managed.

This is a companion discussion topic for the original entry at
1 Like

Wish this was out a while back, half of the stuff described here I had to read through the programmers reference manual and arm docs to learn it. I bet it will save some time for others down the road. Nice artictle!

Thanks for the kind words @Silviu! I also wish this had existed before ;-).

An article I had been waiting for!!! I am the poor man in the poorman’s profiling. I have been using the sampling method for all my profiling so far. I happen to have the exact STM32 dev kit, I cannot wait to try the example project. Thanks very much for pulling all the information to one single post.


I followed the steps in the example and found the code is flashed but no LCD update. I tried the STM32 demo for the dev board and everything worked fine. So I can confirm that my board is working. I cleaned the libopemcm3 repo and rebuilt (you have to first make a full build before you can build just from the example that you want. The first build creates the library). I followed the steps to flash the code and still nothing on the screen. Upon checking gdb, it looks like micro is in la la land due to possible vector table issue. Anyone else seeing similar issue?

Program received signal SIGINT, Interrupt.
reset_handler () at ../../cm3/vector.c:67
67		for (src = &_data_loadaddr, dest = &_data;


Hey Rajah,

Sorry you’re having issues reproducting the results. I double check and did not have to modify anything in libopencm3 or the example to get it to work on my STM32F429I-DISC1 board (is that the same model you have?).

One thing worth checking: I’m on a slightly older commit of the library. libopencm3-examples commit 9830486509cb93b4504aa3a0207e9a43f3308d28, and libopencm3 submodule commit 7be50a5e75ed2d163d38a6759347c5e778ac02ab. Perhaps a regression was introduced recently?

Hi Francois, thank you for the quick response. I just noticed that my eval kit is STM32F429I-DISCO and not STM32F429I-DISC1. The main difference b/w them is the default state of Rx and Tx jumpers for VCOM over USB. STM32F429I-DISCO has open SB11 and SB15, while STM32F429I-DISC1 has them closed by default. I will make this change and try your suggestion about the commit id on git.


Hi there, thank you for this article!

As you pointed out the last method doesn’t really work for multi-threaded applications on MCU. Most of my projects are running FreeRTOS and I was wondering if there is a way in OpenOCD to profile individual threads (or function) without taking into account task switch and IRQs. Do you know if it is possible?

AFAIK OpenOCD supports thread-aware debugging so normally it should be able to “detect” kernel events.

PS: Thank you again, this blog is like heaven for embedded software engineers :slight_smile:

Check out the section in the FreeRTOS docs about tracing:

You could define traceTASK_SWITCHED_IN() and traceTASK_SWITCHED_OUT() to start and stop your profiling counters as necessary when your tasks changed, or even keep separate counters for different tasks based on their task tags.

Thanks for this excellent post. Quite fascinating what can be achieved with free software!

In the last part you are describing how to measure the runtime of functions.
This is something we tried to make more accessible using a web-frontend:
It is a free service that lets you define functions and their benchmark input data.
Using the techniques you described in your blog (DWT/CycleCounter), the precise runtime is measured on the MCU and shown.

I think the easiest way to profile your RTOS is to dynamically enable / disable ITM tracing when you switch threads. Many RTOS provide hooks for this (e.g. @kisielk points out traceTASK_SWITCHED_IN() in FreeRTOS), but if not it’s relatively easy to implement yourself by modifying your scheduler code.

You could use GDB scripting to setup a breakpoint in the Systick IRQ and automatically enable / disable ITM as needed. This would work but would introduce overhead, as the Poor Man’s Profiler does.

Interesting. Shoot me an email francois - at - memfault if you ever want to chat about BareBench!

I’m trying to get your ITM example to run but having difficulty. In particular it breaks at this point, since that would write 0x1207 to many memory address from what I can see. Is there a part to this that is missing? Just writing 0x1207 by itself with the other commands doesn’t provide any output for me.