GNU Binutils: the ELF Swiss Army Knife | Interrupt

There’s a lot that takes place between the C code you write and the binary that winds up executing on a device. Understanding how to look at and inspect what is emitted by the compiler saves time and can improve your efficiency in many areas of the development lifecycle – such as debugging system problems, identifying issues with compilers or debug info emitted, reducing the size of binaries, and optimizing an application for performance and latency.


This is a companion discussion topic for the original entry at https://interrupt.memfault.com/blog/gnu-binutils
1 Like

Thanks for putting this together, Chris! I learned a lot!

Thanks @rary. Glad to hear you enjoyed!

great post.

I have some comments that I thought I’d publish in a previous post (Tracking Firmware Code Size | Interrupt). But I think they’re probably more relevant to the topics of this article.

The question concerns the vma and lma that can be expressed in the linker script with the keyword AT.

let’s take the following linker script:

/* Entry Point */
ENTRY(main)

/* Specify the memory areas */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 128K
  RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 32K
}

/* Define output sections */
SECTIONS
{
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
    *(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

  /* Constant data goes into FLASH */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    . = ALIGN(4);
  } >FLASH

  .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
  .ARM : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
  } >FLASH

  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data : 
  {
    . = ALIGN(4);
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
  } >RAM AT> FLASH
  
  /* Uninitialized data section */
  .bss :
  {
    . = ALIGN(4);
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
  } >RAM

  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }  
}

I’d like you to put the attention on the .data and .bss sections:

  .data : 
  {
    . = ALIGN(4);
    *(.data*)          /* .data* sections */

    . = ALIGN(4);
  } >RAM AT> FLASH
  
  /* Uninitialized data section */
  .bss :
  {
    . = ALIGN(4);
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
  } >RAM

I try to explain it in words: we are saying that .data has vma in RAM but lma in FLASH, instead .bss I expect it to have vma and lma identical and in RAM.

this is explained in 3.1 Basic Linker Script Concepts and in 3.6.8.2 Output Section LMA

I consider a very simple source file:

static int g_my_global_bss;
static int g_my_global_data = 37;
const int g_my_global_rodata = 45;
int main(void)
{
    g_my_global_bss = g_my_global_data + g_my_global_rodata;
    return 0;
}

I build the elf with this makefile:

all: main.elf

CFLAGS = \
	-std=gnu11 \
	-mcpu=cortex-m4 \
	-mthumb \
	-specs=nano.specs \
	-O0 \
	-Wall \
	-ffunction-sections \
	-fdata-sections \
	-c \
	-Werror

LDFLGS =  \
	-mcpu=cortex-m4 \
	-mthumb \
	-specs=nano.specs \
	-Wl,--gc-sections \
	-Wl,--print-memory-usage

LD_SCRIPT = STM32F410RBTx_FLASH.ld

CC = arm-none-eabi-gcc

main.elf: main.o
	$(CC) $(LDFLGS) -T$(LD_SCRIPT) -Wl,--cref,-Map=$(@:.elf=.map) -o $@ $^

main.o: main.c
	$(CC) $(CFLAGS) -o $@ $< 

.PHONY: clean

clean:
	rm *.o *.elf *.map

linker tells me:

Memory region         Used Size  Region Size  %age Used
	   FLASH:          64 B       128 KB      0.05%
	     RAM:           8 B        32 KB      0.02%

and size tells me:

max@jarvis:~/Dropbox/test_mem$ arm-none-eabi-size -G main.elf
      text       data        bss      total filename
	60          4          4         68 main.elf

then I extract the information of vma and lma from the elf using objdump.

max@jarvis:~/Dropbox/test_mem$ arm-none-eabi-objdump -h main.elf

main.elf:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000003c  08000000  08000000  00010000  2**2
		  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000000  0800003c  0800003c  00020004  2**0
		  CONTENTS, ALLOC, LOAD, DATA
  2 .data         00000004  20000000  0800003c  00020000  2**2
		  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000004  20000004  08000040  00020004  2**2
		  ALLOC
  4 .ARM.attributes 0000002a  00000000  00000000  00020004  2**0
		  CONTENTS, READONLY
  5 .comment      00000079  00000000  00000000  0002002e  2**0
		  CONTENTS, READONLY

again I highlight .data and .bss (and also .text for comparison)

Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000003c  08000000  08000000  00010000  2**2
  2 .data         00000004  20000000  0800003c  00020000  2**2
  3 .bss          00000004  20000004  08000040  00020004  2**2
  • .text has vma and lma coinciding and in FLASH. I expected this
  • .data has vma in RAM and lma in FLASH. I expected this
  • .bss has vma in RAM and lma in FLASH as .data. I wasn’t expecting that.

why this unexpected behavior? What am I missing? Is this a bug?

best regards
Max

great post.

Thanks, glad you enjoyed!

  • .bss has vma in RAM and lma in FLASH as .data . I wasn’t expecting that.
    why this unexpected behavior? What am I missing? Is this a bug?

There’s a set of rules that are applied if AT or AT> are not explicitly provided in the linker script. You can find all the details in the binutils ld docs here but it looks like you are falling into case 3:

  • […]
  • Otherwise if a memory region can be found that is compatible with the current section, and this region contains at least one section, then the LMA is set so the difference between the VMA and LMA is the same as the difference between the VMA and LMA of the last section in the located region.
  • […]

@chrisc Awesome post! I would like to incorporate this with my Segger Embedded Studio toolchain and I followed along but I can’t seem to get it working. Do I need to install the ARM compiler separately?

I‘m exciting when reading this blog! I didn’t expected that there are so much treasure in the gnu binary! Thanks! I learn a lot from here!

Thanks for clarifying these, shady to us newbies, gcc utilities. Awesome quality post as usual