Stuck Debugging a Hardfault on a cortex M7

AJLumsden · March 23, 2022, 2:41pm

I am currently trying to debug a hardfault on a cortex M7 running freertos. I have been using your great blog post and webinar on this subject and I have learnt a lot from it, so thank you very much for that. Although it has been really helpful and allowed me to improve my processes, I have not been to progress much in finding the cause of my hardfault.

Context:
I have a monitoring system that has async I2C coms, it was running stable until I changed the optimization from none to ‘optimize for debugging’ and now it triggers a hardfault. There are a couple things I can do to stop the hardfault occurring like, carrying out the i2c transactions sequentially or putting a delay between the read_start and read_end calls. Also if I simplify the logic for sensor selection the hardfault takes a lot longer to occur.

When I was using the debugger I found that putting breakpoints in certain places or stepping through the code in certain ways would cause the hardfault not to be triggered at the expected point and would push it back to occur later if the program was left to run freely.

When using the methods described in the blog post and webinar I found that the state of the fault registers and the back trace are not consistent each time I trigger the fault and the back trace contains functions that should not be present.

Questions:
Basically because of these inconsistencies and that I’m not very experienced with debugging programs at such a low level I am not sure how to proceed from here. I’m wondering is there anything that can be deduced from the behaviour I seen so far that may help me in progressing or are there any other methods I can try out?

Thanks for providing this platform and the great content.

francois · March 23, 2022, 3:32pm

It sounds like you have one of the following:

A race condition
Memory corruption

These are tricky issues to debug because the bug is triggers silently, and the HardFault isn’t raised until later on in unrelated code.

I would check the following:

Has the stack overflowed? Check your stack pointer is in bound, and enable stack overflow protection if not already done.
Are non-IRQ-safe APIs being called in an interrupt context? FreeRTOS has a config flag you can toggle to catch those instances (if using FreeRTOS).
Are pointers being used after they’ve been freed? Some heap implementations have debug toggles you can enable to catch this
Are you forgetting to lock a mutex somewhere? Every function that expects a mutex to be held should assert if not.

These are just a few ideas.

AJLumsden · March 24, 2022, 9:06am

Hi Francois,

Thanks for the reply.

I have used the overflow hooks and high water mark tools that FreeRTOS provides and they have not shown any issues. I may double check the hooks again though and prove they are working.

I will double check the other suggestions also but I am pretty certain that these things are not occurring.

Another thing I thought of trying out is turning off optimization off for certain functions until the hardfault stops, seeing as the program seemed stable with no optimizations.

ps. Is this the correct section of the forum to post this in?

Thomas · April 5, 2022, 4:50am

Do you have the chance to try it on a 2nd device to exclude an hardware error?
Could it be an alignment problem in dynamically allocated memory?

Topic		Replies	Views
How to debug a HardFault on an ARM Cortex-M MCU \| Interrupt Blog	2	345	January 8, 2024
How to debug a HardFault on an ARM Cortex-M MCU \| Interrupt Blog	20	3547	October 14, 2022
Hard fault on STM32 with freertos (missing task stack trace) Memfault Help	3	1103	April 1, 2022
What kind of Hardfault is tracable and how to do it? Memfault Help	2	739	September 13, 2021
I want to know Top Techniques for Using Memfault to Implement Remote Debugging General	0	29	October 11, 2024

Stuck Debugging a Hardfault on a cortex M7

Related topics