Tracing & Logging with the TRICE Library (Interrupts too!) | Interrupt

discobot · January 12, 2022, 5:45pm

If you develop software for an embedded system, you need some kind of system feedback. Debuggers are awesome tools, but when it comes to analyzing dynamic behavior in the field, they cannot be used.

This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/trice

miro · January 23, 2022, 5:25pm

I’m really glad to see that other fellow embedded software developers reach similar conclusions about software tracing (a.k.a. logging).

At Quantum Leaps (state-machine.com), we’ve been doing software tracing of embedded code for over 17 years. We call it QP/Spy:

https://www.state-machine.com/qtools/qpspy.html

you can also see how our software tracing works in a YouTube video:

It’s interesting to see and compare other approaches to the problem. Also, because we’ve been doing this for so long, perhaps it will be useful to share some lessons we’ve learned along the way.

But first, you’re absolutely right that clogging the embedded code with printf’s is extremely inefficient. It puts expensive ASCII formatting in time-critical paths through the code – exactly where you can afford it the least. Also, in the formatting process you significantly increase the amount of data. For example, a single byte sent as a hex string (e.g., “0xXX”) takes 4 bytes.

So, the obvious solution is to send the tracing data out of the embedded target in binary and do the formatting on the powerful host computer, where the overhead doesn’t matter.

Now, of course the devil is in the detail of HOW you do this.

Your approach of eliminating the format strings and replacing them with binary IDs (apparently some form of hash function) seems novel and interesting. Although it seems to require a pre-processing step for the source code to generate the IDs. I can imagine a whole set of problems with managing the additional build step and with versioning the various harvested format strings on the host. Of course, it also requires access to the source code when tracing an already deployed target systems.

In QP/Spy we handle this differently. We don’t have format strings. Instead, we have application-specific trace records (packets), see:

https://www.state-machine.com/qtools/qs.html#qs_app

Here is an example of code inside the target:

QS_BEGIN_ID(MY_STATUS, local_filter) /* app-specific record begin */
    QS_U8(1, n);    /* app-specific uint8_t element */
    QS_STR("fine"); /* app-specific string element */
QS_END()            /* app-specific record end */

The QS_xxx() statements are macros that generate a tracing packet in the buffer when software tracing is enabled. They compile to nothing when software tracing is disabled.

In our approach each data element within a packet is generated with one additional byte of format, which describes both the type (u8 vs string vs u32, etc.) and the number of characters to format the quantity (for numerical values). For example, in the trace record above, the format overhead will be 2 bytes (for the two data elements).

The app-specific trace record wound produce the following (ASCII-formatted) output on the host:

3136059740 MY_STATUS 3 fine

All these binary logging schemes have the problem of preserving the symbolic information about the tracing data. In your case it seems that you use the format-IDs and the mapping to the actual strings harvested from the target code.

In our case, we use “dictionaries”, which are generated by the target:

https://www.state-machine.com/qtools/qs.html#qs_dict

Our “dictionaries” are mappings between the binary addresses within the embedded target (which are unique) and symbolic names of the objects or functions. This information is analogous to the symbolic information added by the compiler when building for debugging.

The “dictionaries” are produced by the target only during the initialization, so there is no overhead during normal execution. The advantage is that the “dictionaries” are always in synch with the specific target code.

Now, regarding the binary transport protocol, we’ve selected HDLC-type encoding:

https://www.state-machine.com/qtools/qpspy_proto.html

Similar to your COBS encoding, HDLC also has a “flag” byte and transparency (a.k.a. escaping), such that the “flag” byte does not occur in the byte stream. However, the HDLC “flag” byte is 0x7E, in contrast to COBS “flag” being 0x00. From our experience, 0x00 occurs in natural data much more frequently than 0x7E, so much more “escaping” is needed with COBS.

Our QP/Spy protocol has been specifically designed to simplify the data management overhead in the target, yet to allow detection of any data dropouts due to the trace buffer overruns. The protocol has not only provisions for detecting gaps in the data and other errors but allows for instantaneous re-synchronization after any buffering or transmission error to minimize loss of useful data.

We also wanted to easily implement the “last is best” policy, so our trace ring-buffer (typically of 1K bytes) wraps around naturally when the data production outpaces data sending. This is fine, because our protocol detects any gaps in the data (sequence byte).

Regarding other features, we don’t use “tracing levels”. Instead, we have flexible data filtering based on the data-ID (“global filter”) and on the specific object (“local filter”) that produced the data record.

Of course, I cannot describe all the features of QP/Spy and directions in which we’ve taken the software tracing (e.g., testing). As I said, we’ve been doing this and refining it for a very long time…

Thomas · January 26, 2022, 9:42pm

Thanks for pointing to QSPY. Comparing different solutions for similar tasks is exciting and opens eyes for improvement. It would be interesting to examine, try out and compare different trace/log techniques in an open source example project.

A bit more information about Trice for better understanding:

Unfortunately I forgot to mention, that avoiding source code modification prior compiling was one of the major Trice design goals, because it eases debugging. Writing about “an automatic pre-compile step” is misleading and I apologize. The automatic ID insertion/modification takes part only and only when Trice macros were added/changed by the user. Yes, trice update modifies the source code and yes it is a good idea to do that automatically in a pre-compile step, but once the IDs are inserted/modified, the source code is not changed anymore. It goes in this form to the repository and is compiled as usual.

The only thing needed after deploying a device (firmware) is the ID list together with the trice tool. Both could be hosted on the internet. One could also place the printing functionality in an external MCU or mobile device and let this do the converting to an ASCII stream.

The Trice ID list does not need versioning. It accumulates all ID-string pairs as long as it does not reach the ~65000 IDs limit. With the same latest til.json all legacy firmware images and the newest one live happily together. In one of the former Trice attempts each ID got a created and a removed timestamp to be able to clean up the JSON list based on this, but then it got dropped to keep things easy. This could get interesting again, if tailoring a Trice variant for smallest MCUs using only 8-bit IDs.

Trice is target buffer agnostic. The implemented double buffer in deferred Trice mode was selected as the fastest execution option.

COBS or Escape Control Bytes?:

Using a stream with escape byte technique could double its size in the worst case. Even if a “seldom” byte is used, the max possible buffer size may be needed and when a data disruption occurs exactly between an escaped delimiter like ...0x7f ^bang^ 0x7F... - what then? OK, one could add a CRC. The COBS technique is iron hard, micro and a package delimiter byte can be any byte but 0 is common praxis.

DickButt · March 6, 2022, 3:08pm

What I’m doing is placing the strings into an info section in the .elf (not included in the binary), and the log contains only the address of the string and a binary representation of the log arguments (packed struct generated using the log macro at compile time).

Using some tricks I can add more information to the string like file/line information.

Then the host software (written in rust) finds the strings in the .elf, and uses information that’s embedded in the string and debug information (DWARF) to parse the log’s arguments and pretty print everything.

The logger supports native types and structs with all of rust’s formatting options and some extra ones, like finding a variable name from a pointer address.

Originally I got the idea from here, but I had to make many adjustments to what’s possible to do in C and GCC, so the idea is similar but the mechanisms are quite different.

A log looks something like:

LOG_INFO("Hi, this is a log with multiple arguments. positional: {0} {2} {1:#02x}, and non positional: {} {:p} {:?}", var1, var2, var3);

It’s expanded into something like: (N is unique in the compilation unit, generated using __COUNTER__)

do{
    static char const log_message_N[] __attribute__((used, section(".logs"))) =
        "N"
        ":"
        __FILE__
        ":"
        __LINE__
        ":"
        "INF"
        ":"
        "Hi, this is a log with multiple arguments. positional: {0} {2} {1:#02x}, and non positional: {} {:p} {:?}"
        ";"
    ;

    struct __PACKED {
        __typeof__(var1) var1;
        __typeof__(var2) var2;
        __typeof__(var3) var3;
    } log_args_N; // The host parser finds this variable by name and parses it's DWARF to know how to correctly parse the arguments from the binary data.

    // This is usually optimized away:
    memcpy(&log_args_N.var1, &var1, sizeof(var1));
    memcpy(&log_args_N.var2, &var2, sizeof(var1));
    memcpy(&log_args_N.var3, &var3, sizeof(var1));

    static_assert(
        sizeof(log_args_N) <= (MAX_ARGS_LEN),
        "Log arguments don't fit into log buffer!"
    );

    _logger_send((size_t)&(log_message_N), &(log_args_N), sizeof(log_args_N));
} while(0);

and the binary data that’s logged is only:

// log address    
sizeof(uint32_t) + sizeof(var1) + sizeof(var2) + sizeof(var3)

Then _logger_send can be implemented however you like, add a timestamp, add cobs, whatever.

The Host side parser needs only the information that’s passed to _logger_send.

The best part about this is that it doesn’t require any extra tools at compilation, except the compiler itself (GCC).

Thomas · March 15, 2022, 9:44am

What I like with this approach, Dick, that is does not need ID’s inside the code, as Trice does. It maybe possible to build a similar solution in C. On the other hand it works only with the exact version match and is compiler specific.

Thomas · March 15, 2022, 9:49am

The Trice v1.0 Specification document contains some ideas to improve Trice in the future. Critical comments are welcome.

DickButt · March 15, 2022, 12:32pm

All target side code is written in C (and could possibly be ported to other languages relatively easily), only the parser that runs on PC side is written in Rust.

Yes you have to have access to the exact .elf, but that’s not a problem as all releases are stored in some kind of artifact repository.

I added a unique log with ID 0 that always has the same structure, it sends the GNU build ID (article by you guys), the build id can be used to look up the repository to get the relevant elf.

bora · June 21, 2022, 2:58pm

Thanks @Thomas for the article. I used TRICE for a project where I needed to log the interrupts without disturbing the millisecond level timing, which worked out pretty well!

I think the modification of the source code is kind of a dealbreaker. Integrating the IDs into the ELF file would be great. @DickButt do you plan to share your work?

Thomas · July 4, 2022, 10:05pm

Just thinking: Creating / extending the ID list could be done without inserting the IDs into the source code. The ID list is enough as reference. The source code would go into the version control unaffected, allowing the same files to be used in several projects with different IDs. For the compilation the sources need then a preprocess step, which inserts the IDs. Seams to be possible with makefiles, but maybe tricky in a more general way to work with any IDE. Also debugging could get difficult. Any ideas?

Thomas · July 9, 2022, 11:31am

A trice clean could be a good solution. It removes all Id’s before a check in.

Thomas · July 24, 2023, 11:48am

In v0.61.0 trice supports now trice insert (the trice update successor) and trice zero|clean to get optionally rid of the IDs inside the source code. The user can edit and check-in the code without IDs. The IDs get into the code with trice insert as pre-compile step and are removed with trice clean in a post-compile step. That is possible with the macros trice*, Trice* and TRice*. The not recommended TRICE* macros can get set their IDs only to 0. The location information is used for exactly repositioning the IDs even with several identical trice messages.

Thomas · April 29, 2025, 8:51am

Trice v1.0 is released. Has a -cache CLI switch.

Topic		Replies	Views
ARM Cortex-M33 Instruction Tracing Without a Debugger \| Interrupt Blog	8	1096	May 2, 2024
Exploring printf on Cortex-M \| Interrupt Blog	3	507	December 4, 2023
Profiling Firmware on Cortex-M \| Interrupt Blog	15	1930	August 21, 2025
A Deep Dive into ARM Cortex-M Debug Interfaces \| Interrupt Blog	2	1334	November 23, 2020
Parsing Logs Messages for Instant Crash Analysis \| Interrupt Blog	4	744	May 31, 2024

Tracing & Logging with the TRICE Library (Interrupts too!) | Interrupt

Related topics