Background
It occurred to me recently that I never hear of anyone linking their programs manually - meaning, running ld
directly rather than through a wrapper utility like the gcc
or clang
drivers.
Given two simple translation units, I thought, one should be able to:
- invoke
gcc
with the-c
flag, which skips the linking step and outputs the object files - invoke
ld
with the object files, asking it politely to output a statically-linked binary - run the binary and move on with my life
The Program
We use a very simple program, defined across two files, square.c
and main.c
, for this exploration.
1
2
// square.c
int square(int n) { return n * n; }
The program simply exits with status code 16 by calling square(int)
, defined in a different translation unit from main
, to square 4.
1
2
3
//main.c
int square(int);
int main() { return square(4); }
By running gcc -c main.c square.c
, we get our two object files - main.o
and square.o
. The file
utility describes both of these files as ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
. To parse this a bit:
- ELF is the predominant file format for executables, libraries, and object files in Unix systems.
relocatable
means thatld
can use it in the relocation process to produce an executable or shared object.- the ELF file is
not stripped
of debug symbols x86-64, version 1 (SYSV)
is the execution environment or ‘target’ we compiled for -LSB
here stands for ‘least significant byte`, indicating the target is little-endian.
Our troubles _start
Running ld main.o library.o
doesn’t fail, but prints ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
. The resulting binary simply seg-faults when run. Uh oh.
The message is quite straightforward: it looks like the linker tries to use the _start
symbol as the entry point for this target, but cannot find it. We can confirm that this symbol is not present in either of our object files using nm
- a neat utility that lists the symbols in an object file:
1
2
3
4
5
6
7
8
9
$ nm main.o square.o
main.o:
U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
U square
square.o:
0000000000000000 T square
Note that main.o
contains an undefined (U
in the output) reference to _GLOBAL_OFFSET_TABLE_
. In short, the Global Offset Table (GOT) is a structure, primarily operated on by the runtime linker or loader, which allows for symbols to be located during execution. This is why only main.o
, which references an external (‘global’) symbol in square
, has a reference to the GOT.
Having ruled out gcc -c
introducing the _start
symbol in either of our translation units, we may surmise that it must come from another relocatable.
Peeling the onion
If the entrypoint symbol is not provided by the programmer, and not introduced by the compiler, where does it come from? We can use GCC’s verbose mode to observe how it produces a statically-linked executable from object files; gcc --verbose --static main.o square.o
uncovers no direct invocation of ld
, but rather a wrapper utility called collect2. Although collect2
did not have a manual entry, it turned out to have a flag to toggle verbose mode just like the GCC driver, which revealed the ld
invocation we’ve been looking for:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/usr/bin/ld -v -plugin /usr/lib/gcc/x86_64-linux-gnu/9/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccXoZG5w.res
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_eh
-plugin-opt=-pass-through=-lc --build-id
-m elf_x86_64 --hash-style=gnu --as-needed -static -z relro
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/9/crtbeginT.o
-L/usr/lib/gcc/x86_64-linux-gnu/9
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib
-L/lib/x86_64-linux-gnu
-L/lib/../lib -L/usr/lib/x86_64-linux-gnu
-L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/9/../../..
main.o square.o
--start-group -lgcc -lgcc_eh -lc --end-group
/usr/lib/gcc/x86_64-linux-gnu/9/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crtn.o
Wow - that is a lot of flags.
Surely, not all of them are needed for our toy program? We don’t even utilize the standard library. With a little bit of trial and error, I removed a handful of libraries and related flags that deal with link time optimization, exception handling (gcc_eh), etc and was still able to link a working binary with this command:
1
2
3
4
5
6
7
/usr/bin/ld -static /usr/lib/x86_64-linux-gnu/crt1.o
/usr/lib/x86_64-linux-gnu/crti.o
-L/usr/lib/gcc/x86_64-linux-gnu/9
main.o square.o
--start-group -lgcc -lgcc_eh -lc --end-group
/usr/lib/gcc/x86_64-linux-gnu/9/crtend.o
/usr/lib/x86_64-linux-gnu/crtn.o
What does each remaining input supply?
- To answer our original question,
nm /usr/lib/x86_64-linux-gnu/crt1.o
reveals that_start
is defined within the text segment:1 2 3 4 5 6 7 8 9 10 11
$ nm /usr/lib/x86_64-linux-gnu/crt1.o 0000000000000000 D __data_start 0000000000000000 W data_start 0000000000000030 T _dl_relocate_static_pie U _GLOBAL_OFFSET_TABLE_ 0000000000000000 R _IO_stdin_used U __libc_csu_fini U __libc_csu_init U __libc_start_main U main 0000000000000000 T _start
- the undefined
__libc_csu_init
is found in/lib/x86_64-linux-gnu/libc.a
… - …which depends on the
_init
symbol defined in/usr/lib/x86_64-linux-gnu/crti.o
andprintf
implementations inlibgcc
…
… and so on and so forth. The statically-linked binary for our program that does [nearly] nothing is a whopping 851 KiB. nm
reveals 1704 symbols, of which 726 are defined in the text section, including malloc
, qsort
and fprintf
. It works, but surely we can do better.
Becoming a self-starter
Let’s recall how we came down this rabbit-hole - ld
was looking for an appropriate entrypoint named _start
, which happened to be defined in crt1.o
. While main
is the program entrypoint from most programmers’ perspective, _start
is usually the name of the code that serves as the operating system’s entrypoint. There is quite a bit of work involved in program startup on Linux (for example, __libc_csu_init
which we saw in crt1
ends up calling the constructors of global objects in C++).
Luckily for us, our program is not most programs. It is much simpler - which should make a lot of the heavy lifting before main
optional.
What do we need from _start
?
In short, _start
needs to:
- Do anything that “needs to be done” before passing control to
main
. This is intentionally vague and depends on the programming language and execution environment (like constructing globals, aligning the start of the stack at a nice boundary, etc). - If
main
returns control, do any cleanup before returning control to the OS via a system call like exit.
For our toy program, there is almost no additional accounting required. We can implement our setup and teardown logic in about 5 lines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// custom_start.c
#include <sys/syscall.h>
int main();
// In the System V AMD-64 ABI, the first integer arg in
// user-level applications is passed in register %rdi,
// so %rdi holds `main`'s return value. The second arg
// is passed in %rsi, which here holds the sycall number
// for the `exit` system call.
void call_exit(int code, int exit_syscall_num) {
asm("mov %rsi, %rax;" // Copy syscall number into %rax
"syscall;"); // exit's arg is already in %rsi
}
void _start() { call_exit(main(), SYS_exit); }
The target machine described in our ELF file was x86-64, version 1 (SYSV)
. The System V AMD-64 (or x64-64) ABI specifies how such an execution environment should act - right down to describing which registers or sections of memory must be used to pass values to and from functions, known as the calling convention.
We use these guidelines to implement our custom startup and teardown code. The section we’re most interested in is A.2.1:
-
“The number of the syscall has to be passed in register
%rax
” - we’ll want to put the code forexit
in%rax
before executing thesyscall
instruction. -
“User-level applications use integer registers for passing the sequence
%rdi
,%rsi
,%rdx
,%rcx
,%r8
and%r9
. The kernel interface uses%rdi
,%rsi
,%rdx
,%r10
,%r8
and%r9
… “ - we never pass more than two arguments, so for our purposes,%rdi
is argument 0, and%rsi
is argument 1 for both user-level functions and system calls.
Let’s give it a try:
Running gcc -c -static custom_start.c main.c square.c
and linking with ld -static main.o square.o custom_start.o
gives us an executable which exits with the value 16
(if you’re following along, you can run ./a.out ; echo "$?"
).
Our new statically-linked executable is 9.1 KiB, versus the 851 KiB from our last attempt. There are 7 total symbols , down from 1704:
1
2
3
4
5
6
7
0000000000404000 R __bss_start
0000000000401027 T call_exit
0000000000404000 R _edata
0000000000404000 R _end
0000000000401000 T main
0000000000401014 T square
000000000040103d T _start
(9.1 KiB is still a little larger than I would have expected for such a barebones executable - worth a deeper look.)