Simple debug tracing for embedded systems

From Electriki
Jump to navigationJump to search

NOTE: this page is copied here because this is (sort of) electronics-related - the original page is here.

This page demonstrates a bit of software for emitting debug-info from an embedded platform.

After testing it on a PC, it is used to send debug-info over SPI from an Atmel AVR microcontroller, which is then decoded on the host-side using a Bus Pirate tool.

Background and carrot-on-a-stick


(Donkey-pic was borrowed from this site.)

A common risk around here is the following scenario:

  1. I see a nice and "useful" gadget
  2. I buy it
  3. it sits on a shelf, being both useful and a mental burden, without being actually used
  4. I continue doing the things for which this gadget would be useful in the same old way as before

The Bus Pirate from Dangerous Prototypes was such a gadget; it has been lying idle on a shelf for about half a year, without ever being used.

A solution to this which seems to work most of the time:

  1. get an idea containing a problem/challenge
  2. pretend that this problem needs to be solved NOW
  3. use aforementioned gadget to solve the problem

There. New skills gained, actual work postponed, shelf-space cleared.

The "problem" in this case was to have debug-/trace-info emitted from an embedded device, fast, simple, and fool-proof. This debug-info would then be decoded on a host through a serial connection.

SPI as debug-interface

I chose SPI as debug-interface. It offers a full-duplex, synchronous serial bus.

In a nutshell

SPI is very easy to implement in software, is already implemented in hardware by a lot of microcontrollers and COMs, and uses standardised clock-polarity and -phase or mode-number conventions to describe clock- and data-line behaviour.

What I want, is an interface that...

  • is simple to implement in software
  • is not time-critical
  • can achieve high speeds, so that bitbanging debug-info out of a device has not much impact on performance
  • can be decoded/viewed with not too much trouble

Why not I2C?

Compared to I2C, SPI is simpler and has less restrictions:

  • no addressing
  • does not have a word-size
  • does not require start- and stop-conditions to start/stop transfer
  • does not have a concept of ACK/NAK

I think I2C is nice for an actual multi-node bus, but overkill in my case.

Why not use an UART?

Emitting debug-info using an UART is pretty simple. However, ...

  • the clock-source to an UART has some restrictions w.r.t. accuracy. An internal RC-oscillator may or may not be accurate enough under all circumstances.
  • if you run out of hardware-UARTs, making a proper software-UART is not trivial, and takes up a fair amount of processing at higher speeds.

    Making a bitbanging UART-implementation is impossible in most cases, because of timing-requirements.
  • the greatest advantage (to me) of using an UART over I2C/SPI is that you can simply connect a terminal-emulator to read emitted data.

    This advantage may be a bit silly: to allow for easy configuration on the host-side, a burden is placed on every embedded device.

    Wouldn't it make more sense to use the simplest interface possible on the embedded side, and make the host adapt to this interface?

So, in short, I think an UART is cheap to use on the host-side, and costly to use on the embedded side.

Methods of decoding SPI

Oscilloscopes with protocol-/serial decoding

One of the ways to decode/receive serially emitted debug-info is to use a digital memory-oscilloscope with built-in serial protocol-decoding.

I happen to use a Rigol DS1054Z, which has software-options (sold separately...) for decoding I2C, SPI, UART-serial and probably more.

Serial decoding in action:


An advantage of using an oscilloscope is that you can see the actual signal as well as the decoded data. (A debugging-mechanism should be reliable, and poor signal-quality can really ruin your day.)

A disadvantage is that you might have to transfer logged data to a PC for further analysis and backup, in case the scope's features w.r.t. serial decoding are too basic.

SPI-to-USB bridge

Like UART-to-USB bridges (such as these), there also exist SPI-to-USB bridges/dongles (such as these). These devices sit between a host-PC and a device with SPI-interface, allowing communication between both ends.

I have almost no experience with these devices.

I can imagine they are not as transparent as UART-to-USB bridges to use from software, since SPI-configuration (speed, clock-polarity/-phase, master/slave role assignment) may not be offered by the OS in a standard way. (Compare this to programmatically configuring an UART w.r.t. speed and framing, which is offered by all common OSes.)

The Bus Pirate is a multi-functional tool that can, among many other things, act as an SPI-to-USB bridge.

At the host-side of such a bridge, raw data can e.g. be displayed or streamed to a file for analysis, using your favourite software-tools.


The software presented here basically consists of only 1 macro and 1 function to emit debugging-info:

  • "DBG_TRACE" : trace-macro
  • "dbg_trace" : proof-of-concept trace-function

Scope and dependency

Only the formatting of debug-values and the concept of "debug-message" is implemented. The user must implement the following underlying functions for emitting raw bytes and message-elements:

  • "dbg_emit_start()" : start-of-message behaviour/appearance.
  • "dbg_emit_byte()" : emit single message-byte.
  • "dbg_emit_field_sep()" : handling of separators between message-fields
  • "dbg_emit_end()" : end-of-message behaviour/appearance.

For example, when using SPI to emit debug-values, these functions could control the CS/SS-, SCLK- and MOSI-lines.

Using a function-implementation (dbg_trace)

The "dbg_trace"-function was made later, and only exists to compare code-size between function- and macro-implementation.


    dbg_trace( format, val1, val2, ... )

where "format" specifies the way in which debug-values "val1", "val2" etc are emitted.

The format-string consists of letters, each indicating the width and type of the argument at that position:

  • b : print first byte of argument
  • w : print first 2 bytes of argument ("word")
  • s : interpret argument as string (nul-terminated
  • a : the next 2 arguments are assumed to be a pointer-to-data, and size-of-data

Example of use:

    dbg_trace( "sba", "pos", u8, au8, sizeof( au8 ) );

outputs the string "pos" and the value of variables "u8" and "au8" as (s)tring, (b)yte and (a)rray, respectively.

More examples and output on PC- and AVR-platform follow later.

Using a macro-implementation (DBG_TRACE)

The "DBGTRACE" macro is similar to "dbgtrace", except that it doesn't require a format-specifier and is much more flexible w.r.t. emitting arguments.

Apart from "datapointer-and-datasize"-tuples, all arguments passed to "DBGTRACE_" are automatically converted to the appropriate binary representation:

  • for string-type arguments, binary string-data is emitted up to the trailing nul-character
  • for array-type arguments, the binary representation is emitted, where "sizeof()" determines the number of bytes
  • for scalar arguments, the binary representation is emitted, where "sizeof()" again determines the number of bytes

Note that by "array-type arguments", actual arrays (that is, compile-time constants) are meant, not pointers.

"Pointer-to-data" arguments are handled differently. Because the compiler has no way of knowing the number of bytes to be emitted, this number has to be given by the user. Macro "DBGREGION" is used to pass these datapointer-and-size tuples to "DBGTRACE".

Example of using "DBGREGION_" (the two macro-instances do exactly the same):

    int ai[] = { 1, 2, 3 };
    DBG_TRACE( ai );
    int *pi = ai;
    DBG_TRACE( DBG_REGION( pi, sizeof( ai ) ) );

If interested, see the source of file "dbgint.h_" for what I thought was a rather clever way of looping through varargs macro-arguments. (Not my own idea.)

Generic selection

The mechanism behind "DBGTRACE_" is that of C11's generic selection.

A generic selection is a compile-time mechanism, evaluating one out of a number of expressions based on the type of a given argument.

As an example, the following macro...

    #define PRINT_TYPE_OF( Val )                              \
        puts( _Generic( ( Val ),                              \
                    int     : "'" #Val "' is an integer",     \
                    char *  : "'" #Val "' is a string"  ,     \
                    default : "'" #Val "' is a mystery" ) )

used in this code...

    #include <stdio.h>

    int main( void )
        // Print some integers.
        int i = 42;
        PRINT_TYPE_OF( i   );
        PRINT_TYPE_OF( 123 );

        // Print some strings.
        char *s = "tweet";
        PRINT_TYPE_OF(  s     );
        PRINT_TYPE_OF( "oink" );

        // Mixed bag.
        float f  = 1.23;
        void *pv = 0;
        PRINT_TYPE_OF( f    );
        PRINT_TYPE_OF( pv   );
        PRINT_TYPE_OF( 1L   );
        PRINT_TYPE_OF( NULL );
        // (avoid 'unused variable' warnings)
        ( void )i;
        ( void )s;
        ( void )f;
        ( void )pv;
        return 1;

produces the following output:

    'i' is an integer
    '123' is an integer
    's' is a string
    '"oink"' is a string
    'f' is a mystery
    'pv' is a mystery
    '1L' is a mystery
    'NULL' is a mystery

Note that both literals and variables (or any expression, for that matter) may be passed to the "PRINTTYPEOF" macro, as long as it is valid C. (The "__Generic"-construct is not_ a macro, and thus its "case-expressions", even if never evaluated, must still be valid C.)

Here is another nice page with examples of generic selection.

(BTW, in the above definition of "PRINTTYPEOF", the result of the generic selection (a string-literal) can not be concatenated with another string-literal - hence the duplication of string "Val is a ". I think the reason for this is that string-concatenation happens in an earlier translation-phase than the evaluation of generic selection.)

Caveat: byte-ordering

Arguments passed to "DBGTRACE" are emitted byte by byte, in machine order_. This can be confusing when emitting e.g. bigger numerals - 32-bit numeral "7" may be emitted as 0x07, 0x00, 0x00, 0x00 (on Intel/AMD/AVR).

Test on PC

To run the debug-trace code on a PC, 4 underlying functions have to be implemented in a way that makes sense for a PC-platform:

    void dbg_emit_start( void )        { printf( "[" );          }
    void dbg_emit_byte( uint8_t byte ) { printf( "%02x", byte ); }
    void dbg_emit_field_sep( void )    { printf( " " );          }
    void dbg_emit_end( void )          { printf( "]\n" );        }

As an example, consider the following definitions:

    uint8_t u8 = 1;
    short   s  = 2;
    int     i  = 3;

    const uint8_t  au8[] = { 4, 5, 6 };
    const uint8_t *pu8   = au8;
    const char    *str   = "bla";

In this case, the following debug-trace...

    DBG_TRACE( u8, s, i, au8, DBG_REGION( pu8, sizeof( au8 ) ), str );

has the following output:

    [def:01 def:0200 def:03000000 cuc*040506 ptr:040506 cc*:626c61]

(Printing of "def:", "cuc*", "ptr:" etc can be enabled or disabled as follows (file "dbgint.h_"):

    #if 0
    #define dbg_print   printf
    #define dbg_print   ( void )

Note that even strings are emitted as raw byte-values: string "bla" is emitted as 0x62, 0x6c, 0x61.

Actual use on Atmel AVR

My setup for doing debug-tracing on Atmel AVR looks like this:


(Top left: AVR-programming dongle; top right: AVR target; bottom: Bus Pirate.)

Sending trace-info: ATtiny2313

A LED-matrix driver I made a long time ago was the nearest-by thing that could run AVR-code:


It is basically an ATtiny2313 on a board, with a header soldered onto it. (This thing was originally connected to an 8x8 LED block.)

Apart from a ground-connection (through the programming-cable), the 3 coloured wires on the right going off to the Bus Pirate, take the function of CS/SS, MOSI and SCLK.

The programming-dongle...


is basically an FT232RL soldered onto an universal AVR-devboard I made a while ago. See how professional it looks with the hot glue, messy wires and handwritten text.

Decoding trace-info: Bus Pirate

This was the first time I used this tool. You can talk with it using a terminal-emulator at 115k2 8N1.

About the Bus Pirate tool

I guess I like this tool - it can save a lot of time getting something to work, fast. Then again, the support/documentation is... questionable.

Perhaps I am too spoiled, expecting coherent documentation to exist for a mature product - I don't know. Documentation seemed to be all over the place. Admittedly, all over DP's site, not the entire internet. I wanted to contact them about this, but the site's contact-link didn't work. Oh well.

One flaw I noticed was a mismatch between colours of supplied ribbon-cable and colour-names in firmware:


Why anyone would put something as volatile as the colour of cable strands in firmware is beyond me, but anyway, so be it. It would also have been nice if the actual pin-out was printed on the PCB itself.


(See Bus Pirate pinout for details.)


The tool can operate in one of several modes, which can be selected by entering "m":

    1. HiZ
    2. 1-WIRE
    3. UART
    4. I2C
    5. SPI
    6. 2WIRE
    7. 3WIRE
    8. LCD
    9. DIO
    x. exit(without change)

At power-on, all I/O-pins are put in hi-Z state, which is a Good Thing.

For testing the software, I chose the SPI-mode, and when asked, configured it as follows:

  • "speed": 30 kHz (default; irrelevant, since we'll be sniffing/receiving)
  • "clock polarity": idle low (default)
  • "output clock edge": idle to active
  • "input sample phase": middle (default)
  • "CS": /CS (default)
  • "output type": normal/driven

The tool can go into an SPI "sniffer" mode, by entering "(1)":

    Any key to exit

where it displays any traffic within CS/SS-enabled periods.


The same C-definitions as used in the PC-example...

    uint8_t u8 = 1;
    short   s  = 2;
    int     i  = 3;
    const uint8_t  au8[] = { 4, 5, 6 };
    const uint8_t *pu8   = au8;
    const char    *str   = "bla";

with the same debug-trace as earlier...

    DBG_TRACE( u8, s, i, au8, DBG_REGION( pu8, sizeof( au8 ) ), str );

is sniffed by the Bus Pirate as follows:


Opening and closing square brackets denote activating respectively deactivating CS/SS. The values in parentheses correspond to data on MISO; bare values correspond to data on MOSI.

After stripping uninteresting/undefined MISO-data and adding whitespace between message-fields...

    [01 0200 0300 040506 040506 626C61 ]

this is pretty much the same as the corresponding PC-output...

    [def:01 def:0200 def:03000000 cuc*040506 ptr:040506 cc*:626c61]

with the exception of the way "i" (an int) is represented: 16 bits on AVR, and 32 bits on my PC.


It would have been nice if the Bus Pirate could print newlines after CS/SS was released, since it already has a concept of "messages" or "bursts".


Functions vs. macros in terms of code-size

The only reason for making a function-implementation ("dbgtrace"), was to be able to compare it to a macro-implementation ("DBGTRACE") in terms of resulting code-size.

To do this, I compiled code with 1, 2, 5, 10 and 20 instances/calls of a simple, medium and complex debug-trace, both for function- and macro-implementation.

The same C-definitions as given earlier were used once more:

    uint8_t u8 = 1;
    short   s  = 2;
    int     i  = 3;
    const uint8_t  au8[] = { 4, 5, 6 };
    const uint8_t *pu8   = au8;
    const char    *str   = "bla";

Tested function-calls:

    "simple"  : dbg_trace( "b", u8 );
    "medium"  : dbg_trace( "sba", "pos", u8, au8 )
    "complex" : dbg_trace( "bwwaas", u8, s, i,  au8, sizeof( au8 ),  pu8, sizeof( au8 ),  str );

Tested corresponding macro-instances:

    "simple"  : DBG_TRACE( u8 );
    "medium"  : DBG_TRACE( "pos", u8, au8 );
    "complex" : DBG_TRACE( u8, s, i, au8, DBG_REGION( pu8, sizeof( au8 ) ), str );

Result is displayed below. The number of function-calls (solid lines) and macro-instances (dashed lines) - on the horizontal scale - affect the resulting binary code-size (vertical scale) as follows:

Simple debug tracing for embedded systems chart.png

As can be seen, function-implementation for similar behaviour is always smaller in terms of code-size, but it doesn't make a big difference. (I forgot to take into account the space taken up by format-strings for function-calls; in a real-life situation, that will of course have impact.)

Given the fact that using trace-macros are so much easier, I think I'll use macros whenever possible.

For reference: source-code

Sorry for the bloat - but I would like to avoid linking to such text-files, when I might as well include them. So there.

"dbg.h" (API)

    #ifndef DBG_H_INCLUDED
    #define DBG_H_INCLUDED
    #include <stdint.h>
    #include <stdio.h>
    #include "dbg_int.h"
    // Define the following functions in user-code to implement the actual trace-behaviour.
    // An obvious choice when debugging using e.g. a serial connection, is to insert
    // pretty-printing such as whitespace between fields, and newline-characters at 
    // end-of-message.
    // When instead looking at e.g. SPI-output on a scope, it could make more sense 
    // to use delays between fields, and activate/deactivate CS at message-start/-end.
    void dbg_emit_start( void );          // Start-of-message behaviour/appearance.
    void dbg_emit_byte( uint8_t byte );   // Emit single message-byte.
    void dbg_emit_field_sep( void );      // Separator-behaviour/-appearance between message-fields (not -bytes).
    void dbg_emit_end( void );            // End-of-message behaviour/appearance.
    // Trace-implementation using macros - scales less well than function-call, but initial footprint can be small.
    // Example: 
    //      uint8_t u8 = 1;
    //      short   s  = 2;
    //      int     i  = 3;
    //      const uint8_t au8[] = { 4, 5, 6 };
    //      DBG_TRACE( u8, s, i, au8, "meep" );
    // Numerals, strings and arrays (in literals or variables) are automatically traced,
    // byte by byte, in the order in which data is stored in memory.
    // To emit data referenced by a pointer, use the 'DBG_REGION' helper-macro, as described below.
    #define DBG_TRACE(  ... )   DBG_TRACE2( __VA_ARGS__, 9, 8, 7, 6, 5, 4, 3, 2, 1 )
    // Macro for use within a "DBG_TRACE" argument-list, to bind array and array-size (a "region") together.
    // For example, the 2 following debug-statements do exactly the same:
    //      int ai[] = { 1, 2, 3 };
    //      DBG_TRACE( ai );
    //      int *pi = ai;
    //      DBG_TRACE( DBG_REGION( pi, sizeof( ai ) ) );
    #define DBG_REGION( Ptr, Len )                                            \
        ({                                                                    \
            dbg_emit_array( ( const void * )( intptr_t )( Ptr ), ( Len ) );   \
            ( DBG_REGION_MARKER_TYPE )0;                                      \
    // "Traditional" trace-implementation - scales better than macros, but has bigger initial footprint.
    // Instead of automatic compile-time processing of arguments, this function needs a format-specifier,
    // much like "printf()", but with a different (simplified) syntax.
    // Format-string consists of letters, each indicating the width and type of the argument at that position:
    //      b : emit first byte of argument
    //      w : emit first 2 bytes of argument ("word")
    //      s : interpret argument as string (nul-terminated
    //      a : next 2 arguments are assumed to be a pointer-to-data, and size-of-data
    // Example:
    //      dbg_trace( "sba", "pos", u8, au8, sizeof( au8 ) );
    // ...outputs '"pos"', "u8" and "au8" as (s)tring, (b)yte and (a)rray, respectively.
    // This function is just a proof-of-concept, to allow for comparison against its macro-implementation, above.
    void dbg_trace( const char *fmt, ... );
    #endif // ndef DBG_H_INCLUDED

"pc.c" (driver for PC, using simple "printf()")

    #include <stdio.h>
    #include <stdint.h>
    #include <stdbool.h>
    #include <stdlib.h>
    #include "dbg.h"
    void dbg_emit_start( void )        { printf( "[" );          }
    void dbg_emit_byte( uint8_t byte ) { printf( "%02x", byte ); }
    void dbg_emit_field_sep( void )    { printf( " " );          }
    void dbg_emit_end( void )          { printf( "]\n" );        }
    int main( void )  
        // Your trace-statements here, using "DBG_TRACE" and/or "dbg_trace".
        return 0; 

"avr.c" (driver for Atmel AVR, using bitbanging SPI)

    #include <avr/io.h>
    #include <stdint.h>
    #include <stdbool.h>
    #include <util/delay.h>
    #include "util.h"
    #include "dbg.h"
    // Board physical pins (might make little sense to you):
    //  a1  R   pin12   B0  CS     (header-pin, top left corner of PCB)
    //  c4      pin8    D4
    //  c6      pin11   D6
    //  a4  R   pin15   B3  CLK
    //  c1      pin3    D1
    //  a2  R   pin13   B1  MOSI (we are master, dammit)
    //  (R = 470)
    #define OUTP_SPI   PORTB
    #define DDR_SPI   DDRB
    #define IOM_CS     _BV( 0 )
    #define IOM_CLK    _BV( 3 )
    #define IOM_MOSI   _BV( 1 )
    #define CS_EN    BITS_CLR( OUTP_SPI, IOM_CS )
    #define CS_DIS   BITS_SET( OUTP_SPI, IOM_CS )
    #define CLK_HI   BITS_SET( OUTP_SPI, IOM_CLK )
    #define CLK_LO   BITS_CLR( OUTP_SPI, IOM_CLK )
    // Mental note: delays were added to make SPI-sniffing using Bus Pirate work.
    // (What I saw was "CS" sometimes going high, then low, within a single packet. 
    // At F_CPU of 8 MHz and a ribbon-cable of about 15 cm, I guess there is some coupling.
    // Didn't investigate.)
    static void my_emit_spi_bit( bool bit )
        if ( bit ) MOSI_HI;
        else       MOSI_LO;
        _delay_us( 1 );
        _delay_us( 1 );
        _delay_us( 1 );
    // Output a byte using SPI, with positive clock-polarity and latch-output-on-rising-clock-edge.
    static void my_emit_spi_byte( uint8_t byte )
        int i;
        for ( i = 0; i < 8; i++ )
            bool bit = !!( byte & 0x80 );
            my_emit_spi_bit( bit );
            byte <<= 1;
    // Implemented for module 'dbg'.
    void dbg_emit_start( void )        { CS_EN; _delay_us( 100 );   }
    void dbg_emit_byte( uint8_t byte ) { my_emit_spi_byte( byte );  }
    void dbg_emit_field_sep( void )    { _delay_us( 10 );           }
    void dbg_emit_end( void )          { CS_DIS; _delay_ms( 1000 ); }
    static void my_io_init( void )  { DDR_SPI  =  IOM_CS | IOM_CLK | IOM_MOSI; }
    int main( void )
        // Your trace-statements here, using "DBG_TRACE" and/or "dbg_trace".
        while ( 1 ) ;

"dbg_int.h" (macro-implementations, included by "dbg.h")

    #ifndef DBG_INT_H_INCLUDED
    #define DBG_INT_H_INCLUDED
    void dbg_emit_array( const void *p, size_t N );
    void dbg_emit_string( const char *p );
    // Debug-helper, used to print info about "DBG_TRACE" macro-arguments. (Only meant for PC-build.)
    #if 0
    #define dbg_print   printf
    #define dbg_print   ( void )
    // Reserved type used as marker during generic selection, in case pointer-and-size tuple was 
    // passed to "DBG_TRACE" using macro "DBG_REGION". (Pointer-and-size tuples are a special case.)
    typedef void *****DBG_REGION_MARKER_TYPE;
    // Generic selection expression dealing with string-type arguments to "DBG_TRACE".
    #define DBG_TRACE_STRING( Descr, Var )                            \
        ({                                                            \
            dbg_print( Descr "*:" );                                  \
            dbg_emit_string( ( const char * )( intptr_t )( Var ) );   \
    // Generic selection expression dealing with array-type arguments to "DBG_TRACE".
    #define DBG_TRACE_ARRAY(  Descr, Var )                                          \
        ({                                                                          \
            dbg_print( Descr );                                                     \
            dbg_emit_array( ( const void * )( intptr_t )( Var ), sizeof( Var ) );   \
    // Generic selection expression dealing with numeric/scalar arguments to "DBG_TRACE".
    #define DBG_TRACE_NUMERAL( Var )               \
        ({                                         \
            dbg_print( "def:" );                   \
            __auto_type _x = ( Var );              \
            dbg_emit_array( &_x, sizeof( _x ) );   \
    // Generic selection expression dealing with "region"-type arguments to "DBG_TRACE".
    // Each region-type argument is a pointer-and-datasize tuple, wrapped in a
    // "DBG_REGION"-instance. The DBG_REGION-macro, when expanded, does the actual tracing
    // of a region-type argument.
    // This "DBG_TRACE_REGION" helper only prints additional type-info (if enabled),
    // and expands the DBG_REGION-macro, causing the wrapped region-argument to be traced.
    #define DBG_TRACE_REGION( Var )   \
        ({                            \
            dbg_print( "ptr:" );      \
            Var;                      \
    // "DBG_TRACE_ARRAY_LIST" and friends turn a basic type (e.g. "int" or "long") into a
    // list with drived fully qualified/specified types (e.g. "const", "volatile",
    // "const volatile unsigned", etc). 
    // This list is then used as a "catch-all" clause for arrays of that type, to make
    // sure it is eventually handled OK by "DBG_TRACE_ARRAY" (instead of e.g. being interpreted 
    // as a scalar.)
    #define DBG_TRACE_ARRAY_LIST( Type, Descr, Var )              \
        DBG_TRACE_ARRAY_LIST_2(   signed Type, "s" Descr, Var )   \
        DBG_TRACE_ARRAY_LIST_2( unsigned Type, "u" Descr, Var )
    #define DBG_TRACE_ARRAY_LIST_2( Type, Descr, Var )            \
        DBG_TRACE_ARRAY_LIST_3(          Type,     Descr, Var )   \
        DBG_TRACE_ARRAY_LIST_3( volatile Type, "v" Descr, Var )
    #define DBG_TRACE_ARRAY_LIST_3( Type, Descr, Var )          \
              Type * : DBG_TRACE_ARRAY(     Descr "*", Var ),   \
        const Type * : DBG_TRACE_ARRAY( "c" Descr "*", Var ),  
    // "DBG_TRACE_ITEM" handles a single argument to "DBG_TRACE".
    // Arguments are divided into 3 categories, and handled accordingly:
    //      - string-type arguments (anything looking like a pointer-to-char
    //        without signed/unsigned specifiers) are assumed to point to an
    //        ASCIIZ-string, which is then traced up to the trailing nul-char.
    //      - array-type arguments (anything other than string-type argument
    //        and compatible with a pointer) is traced as a real array. 
    //        That is, the size of the argument (as per "sizeof()") is used to 
    //        determine how many bytes to trace. 
    //        This is probably not compatible with all non-GCC compilers, 
    //        where e.g. "char[ 3 ]" is not handled by a "char *" clause 
    //        in generic selection. ("Clang" apparently makes this difference.)
    //      - anything else is treated as a scalar, where again the size of the 
    //        argument is used to determine how many bytes to trace.
    // Furthermore, there is a special case for region-type arguments, inserted
    // into the argument-list when using "DBG_REGION".
    #define DBG_TRACE_ITEM( x )                                           \
        _Generic( ( x ),                                                  \
                               char * : DBG_TRACE_STRING( "c"   , x ),    \
                         const char * : DBG_TRACE_STRING( "cc"  , x ),    \
                volatile       char * : DBG_TRACE_STRING( "vc"  , x ),    \
                volatile const char * : DBG_TRACE_STRING( "vcc" , x ),    \
                        DBG_TRACE_ARRAY_LIST( char      , "c"   , x )     \
                        DBG_TRACE_ARRAY_LIST( short     , "s"   , x )     \
                        DBG_TRACE_ARRAY_LIST( int       , "i"   , x )     \
                        DBG_TRACE_ARRAY_LIST( long      , "l"   , x )     \
                        DBG_TRACE_ARRAY_LIST( long long , "ll"  , x )     \
               DBG_REGION_MARKER_TYPE : DBG_TRACE_REGION(         x ),    \
                              default : DBG_TRACE_NUMERAL(        x ) )  
    // Trace (e.g. "print") a list of arguments.
    // The way in which each argument is traced/interpreted (e.g. as string, array,
    // scalar) is determined in "DBG_TRACE_ITEM", which is ran on each given argument.
    // The argument-counting trick is neat - non-given formal arguments do not yield 
    // any code. Up to 9 arguments can be given, which is probably plenty.
    // (Origin:
    #define DBG_TRACE(  ... )   DBG_TRACE2( __VA_ARGS__, 9, 8, 7, 6, 5, 4, 3, 2, 1 )
    #define DBG_TRACE2( x1, x2, x3, x4, x5, x6, x7, x8, x9, n, ... )         \
        do                                                                   \
        {                                                                    \
            dbg_emit_start();                                                \
                                                   DBG_TRACE_ITEM( x1 );     \
            if ( n > 1 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x2 ); }   \
            if ( n > 2 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x3 ); }   \
            if ( n > 3 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x4 ); }   \
            if ( n > 4 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x5 ); }   \
            if ( n > 5 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x6 ); }   \
            if ( n > 6 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x7 ); }   \
            if ( n > 7 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x8 ); }   \
            if ( n > 8 )  { dbg_emit_field_sep();  DBG_TRACE_ITEM( x9 ); }   \
            dbg_emit_end();                                                  \
        } while ( 0 )
    #endif // ndef DBG_INT_H_INCLUDED

"dbg.c" (implementation of trace-functions and -helpers)

    #include <stdlib.h>
    #include <string.h>
    #include <stdarg.h>
    #include <stdbool.h>
    #include "dbg.h"
    void dbg_emit_array( const void *pv, size_t N )
        const uint8_t *p = pv;
        for ( ; N--; p++ )  dbg_emit_byte( *p );
    void dbg_emit_string( const char *p )  { dbg_emit_array(  p, strlen( p ) ); }
    static void dbg_trace_uint( unsigned int ui, uint8_t num_byte )
        while ( num_byte-- )
            dbg_emit_byte( ui & 0xff );
            ui >>= 8;
    void dbg_trace( const char *fmt, ... )
        bool first_arg = true;
        va_list vl;
        va_start( vl, fmt );
        while ( *fmt )
            if ( !first_arg )  dbg_emit_field_sep();
            first_arg = false;
            if ( strchr( "bw", *fmt ) )
                unsigned int ui  =  va_arg( vl, unsigned int );
                if ( *fmt == 'b' )  dbg_trace_uint( ui, 1 );
                else      /* 'w' */ dbg_trace_uint( ui, 2 );
            else if ( *fmt == 's' )
                const char *s = va_arg( vl, const char * );
                dbg_emit_string( s );
            else if ( *fmt == 'a' )
                const void  *p = va_arg( vl, const void * );
                unsigned int N = va_arg( vl, unsigned int );
                dbg_emit_array( p, N );
        va_end( vl );

"util.h" (general helper-macros)

    #ifndef UTIL_H_INCLUDED
    #define UTIL_H_INCLUDED
    // Bit-tests/-manipulations.
    #define BITS_SET( Reg, Mask )   do { ( Reg ) |=  ( Mask ); } while ( 0 )
    #define BITS_CLR( Reg, Mask )   do { ( Reg ) &= ~( Mask ); } while ( 0 )
    #define BITS_INV( Reg, Mask )   do { ( Reg ) ^=  ( Mask ); } while ( 0 )
    #define BITS_SET_OR_CLR( Reg, Mask, Set )   do { if ( Set ) BITS_SET( Reg, Mask ); else BITS_CLR( Reg, Mask ); } while ( 0 )
    #define BITS_ARE_ZERO(    Reg, Mask )   ( !( ( Reg ) & ( Mask ) ) )
    #define BITS_ARE_NONZERO( Reg, Mask )   ( !BITS_ARE_ZERO( ( Reg ), ( Mask ) ) )
    // Clip a value.
    #define CLIP_MIN(  Val, Min      )   do { if ( ( Val ) < ( Min ) ) ( Val ) = ( Min ); } while ( 0 )
    #define CLIP_MAX(  Val,      Max )   do { if ( ( Val ) > ( Max ) ) ( Val ) = ( Max ); } while ( 0 )
    #define CLIP_BOTH( Val, Min, Max )   do { CLIP_MIN( Val, Min ); CLIP_MAX( Val, Max ); } while ( 0 )
    #endif // ndef UTIL_H_INCLUDED