A Practical Guide to Watchdogs for Embedded Systems | Interrupt

At some point you’ve probably had to unplug and plug back in an electronic device to get it to work again! System freezes and hangs are not only frustrating to an end user, but they can also be quite challenging to debug and fix.


This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/firmware-watchdog-best-practices

Thanks so much for a good technical look at watchdogs.
One of the issues I’ve always encountered, is how to leave some “breadcrumbs” when the watchdog fires.
For integration test and stability testing its exceeding valuable to have some way of being able to debug it.
The scenario is, QualitAssuranceTesing identifies they have had a watchdog reset the system at 3:30am in the morning - no visible action was happening - what could have happened. Its given to one of the software programmers - how to go back in and identify likely causes.
For the customer field scenario, having the watchdog go off and recover a potential system fail is fantastic - and needs to just work. However again, if visiting a potential field system, its valuable to be able to get some idea if anything has happened.
For debugging after the fact, I think the context does need to be is there any persistent store (a file system SD), and then of course is there a way to write to it - given that the whole system is about to be reset, and flush any file system buffers.
So just throwing out the question - I have some of my own answers, but interested what others thoughts are?

Hey @neilh20, this is the kind of problems we solve at Memfault. DM me if you want to hear more!

Hi @neilh20 this is an interesting question and one I’m considering currently. My thinking at the minute is to wait until reboot and log the Reset Reason Register contents to the SD card. Hopefully over time this would give a picture of where the issues are. Perhaps could also write some stack variables to flash so we can read them back on reboot, this way were not racing to get everything written to the SD card. Would love to hear your opinions though on what the best practices are!

Great to get some discussion - I should say that I read this because https://platformio.org/ featured it in their opening page.
I was thinking since the processor is going to be reset - to log the details to the uSD before reset…
Depending on the system, logging something after reset is also a good idea but possibly much more limited. Another way could be to have some private ram that isn’t initialized, and it that has a valid string (checksumed off course) then to log that on startup.
The question is what details are available. In the past with larger systrem have written some for of stack dump, but it can be a fair amount of work to decode the stack.
Another way might be to have each call to the watchdog given a unique number, and then log that number.

Thanks for this great article as usual.
I have a question about the adjacvent notion of reset and how it is implement in microcontrollers.
When the watchdog resets the system, what happens to the preriphals ? is it same as resetting the mcu from the hardware reset line ?

The notion of reset in general is pretty mystical. Seems like there is soft reset, system reset, warm reset, watchdog reset. aany one has a clear idea about what that means ?