Skip to main content

Lab 12 - Linking

Task: Linking a Single File

Access the directory drills/tasks/single-link/support/example/. We want to follow the linking commands for a single C source file. The source file is hello.c.

In the three subdirectories, you will find support files for the following scenarios:

  • a-dynamic/: creating a dynamic executable file
  • b-static/: creating a static executable file
  • c-standalone/: creating a standalone executable file without the standard C library

In each subdirectory, we use the make command to compile the executable file hello. We use the command file hello to check whether the file is compiled dynamically or statically.

In the Makefile files, the linking command uses gcc. An equivalent command that directly uses ld is commented out. To track the direct usage of ld, we can comment out the gcc command and uncomment the ld command.

In the case of c-standalone/, since we are not using the standard C library or C runtime library, we need to replace their functionalities. The functionalities are replaced in the start.asm and puts.asm files. These files implement the _start function/symbol and the puts function, respectively. The _start function/symbol is, by default, the entry point of an executable program. The _start function is responsible for calling the main function and terminating the program. Because there is no standard library, these two files are written in assembly language and use system calls.

Bonus: Add a command in the Makefile in the c-standalone/ directory that explicitly uses ld for linking.

Access the directory drills/tasks/single-link/support/diy/. We want to compile and link the source files in each subdirectory, following the model of the previous exercise.

Copy the Makefile files and update them in each subdirectory to obtain the executable file.

If you're having difficulties solving this exercise, go through this reading material.

Task: Linking Multiple Files

Access the directory drills/tasks/multiple-link/support/example/. We want to follow the linking commands from multiple C source files: main.c, add.c, sub.c.

As in the previous exercises, there are three subdirectories for three different scenarios:

  • a-no-header/: external function declarations are made directly in the C source file (main.c)
  • b-header/: external function declarations are made in a separate header file (ops.h)
  • c-lib/: external function declarations are made in a separate header file, and linking is done using a static library

In each subdirectory, we use the make command to compile the executable file main.

Access the directory drills/tasks/multiple-link/support/diy/. We want to compile and link the source files in each subdirectory, following the model of the previous exercise.

Copy the Makefile files and update them in each subdirectory to obtain the executable file.

If you're having difficulties solving this exercise, go through this reading material.

Task: Fixing the Entry Point 1

Access the directory drills/tasks/entry-fix-1/support/. We want to track issues with defining the main() function.

Go to the subdirectory a-c/. Run the make command, interpret the encountered error, and resolve it by editing the hello.c file.

Go to the subdirectory b-asm/. Run the make command, interpret the encountered error, and resolve it by editing the hello.asm file.

Bonus: In the subdirectories c-extra-nolibc/ and d-extra-libc/, find solutions that do not modify the source code of hello.c. These solutions instead modify the build system to use a different function, other than main(), as the program's entry point.

If you're having difficulties solving this exercise, go through this reading material.

Task: Fixing the Entry Point

Access the directory drills/tasks/entry-fix-2/support/. Run the make command, interpret the encountered error, and resolve it by editing the hello.c file.

If you're having difficulties solving this exercise, go through this reading material.

Task: Warning (not an error)

Access the directory drills/tasks/include-fix/support/. Run the make command. A warning appears, but it is from the preprocessing/compilation process. Resolve this warning by editing the hello.c file.

Bonus: Fix the warning without using the #include directive.

If you're having difficulties solving this exercise, go through this reading material.

Task: Fixing Export Issues

Access the directory drills/tasks/export-fix/support/. Each subdirectory (a-func/, b-var/, c-var-2/) contains a problem related to the export of symbols (functions or variables). In each subdirectory, run the make command, identify the issue, and edit the necessary files to resolve it.

If you're having difficulties solving this exercise, go through this reading material.

Task: Using Symbols (Variables and Functions)

Access the directory drills/tasks/10-var-func-fix/support/. Run the make command, interpret the encountered error, and resolve it by editing the source files.

If you're having difficulties solving this exercise, go through this reading material.

Task: Fixing Library Issues

Access the directory drills/tasks/lib-fix/support/. Run the make command, interpret the encountered error, and resolve it by editing the Makefile. Refer to the Makefile in the directory drills/tasks/multiple-link/support/example/c-lib/.

If you're having difficulties solving this exercise, go through this reading material.

Task: Linking an Object File (without Source Code)

Access the directory drills/tasks/obj-link-dev/support/. The file shop.o exposes an interface (functions and variables) that allows displaying messages. Edit the main.c file to properly call the exposed interface and display the messages:

price is 21
quantity is 42

Explore the interface and the content of the functions in the shop.o file using nm and objdump.

If you're having difficulties solving this exercise, go through this reading material.

Linking

Linking is the final phase of the building process. Linking combines multiple object files into an executable file.

To obtain an executable file from object files, the linker performs the following actions:

  1. Symbol resolution: locating the undefined symbols of one object file in other object files.
  2. Section merging: merging sections of the same type from different object files into a single section in the executable file.
  3. Address binding: after merging, the effective address symbols within the executable file can be established.
  4. Symbol relocation: once the symbol addresses are set, the instructions and data referring to those addresses in the executable must be updated.
  5. Entry point establishment: specifying the address of the first instruction to be executed.

Linker Invocation

The linker is generally invoked by the compilation utility (gcc, clang, cl). Thus, invoking the linker is transparent to the user. In specific cases, such as creating a kernel image or images for embedded systems, the user will invoke the linker directly.

If we have a source C file app.c, we will use the compiler to obtain the object file app.o:

gcc -c -o app.o app.c

Then, to obtain the executable file app from the object file app.o, we use the gcc utility again:

gcc -o app app.o

In the background, gcc will invoke the linker and build the executable app. The linker will also link against the standard C library (libc).

The linking process will work only if the file app.c has the main() function defined, which is the main function of the program. Linked files must have a single main() function in order to produce an executable.

If we have multiple C source files, we invoke the compiler for each file and then the linker:

gcc -c -o helpers.o helpers.c
gcc -c -o app.o app.c
gcc -o app app.o helpers.o

The last command is the linking command, which links the object files app.o and helpers.o into the executable file app.

In the case of C++ source files, we will use the g++ command:

g++ -c -o helpers.o helpers.cpp
g++ -c -o app.o app.cpp
g++ -o app app.o helpers.o

We can also use the gcc command for linking, specifying linking with the standard C++ library (libc++):

gcc -o app app.o helpers.o -lstdc++

The Linux linking utility, ld, is invoked transparently by gcc or g++. To see how the linker is invoked, we use the -v option of the gcc utility, with the following output:

/usr/lib/gcc/x86_64-linux-gnu/7/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/7/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper -plugin-opt=-fresolution=/tmp/ccwnf5NM.res
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m elf_i386 --hash-style=gnu
--as-needed -dynamic-linker /lib/ld-linux.so.2 -z relro -o hello
/usr/lib/gcc/x86_64-linux-gnu/7/../../../i386-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../i386-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/32/crtbegin.o
-L/usr/lib/gcc/x86_64-linux-gnu/7/32 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../i386-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib32 -L/lib/i386-linux-gnu -L/lib/../lib32 -L/usr/lib/i386-linux-gnu
-L/usr/lib/../lib32 -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../i386-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. -L/lib/i386-linux-gnu -L/usr/lib/i386-linux-gnu hello.o -lgcc --push-state
--as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state
/usr/lib/gcc/x86_64-linux-gnu/7/32/crtend.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../i386-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-no-pie' '-m32' '-v' '-o' 'hello' '-mtune=generic' '-march=i686'

The collect2 utility is, in fact, a wrapper around the ld utility. The result of running the command is complex. A "manual" invocation of the ld command would look like this:

ld -dynamic-linker /lib/ld-linux.so.2 -m elf_i386 -o app /usr/lib32/crt1.o /usr/lib32/crti.o app.o helpers.o -lc /usr/lib32/crtn.o

The arguments of the above command have the following meanings:

  • -dynamic-linker /lib/ld-linux.so.2: specifies the dynamic loader/linker used for loading the dynamic executable
  • -m elf_i386: links files for the x86 architecture (32-bit, i386)
  • /usr/lib32/crt1.o, /usr/lib32/crti.o, /usr/lib32/crtn.o: represent the C runtime library (crt - C runtime) that provides the necessary support for loading the executable
  • -lc: links against standard C library (libc)

File Inspection

To track the linking process, we use static analysis utilities such as nm, objdump, and readelf.

We use the nm utility to display symbols from an object file or an executable file:

$ nm hello.o
00000000 T main
U puts

$ nm hello
0804a01c B __bss_start
0804a01c b completed.7283
0804a014 D __data_start
0804a014 W data_start
08048370 t deregister_tm_clones
08048350 T _dl_relocate_static_pie
080483f0 t __do_global_dtors_aux
08049f10 t __do_global_dtors_aux_fini_array_entry
0804a018 D __dso_handle
08049f14 d _DYNAMIC
0804a01c D _edata
0804a020 B _end
080484c4 T _fini
080484d8 R _fp_hw
08048420 t frame_dummy
08049f0c t __frame_dummy_init_array_entry
0804861c r __FRAME_END__
0804a000 d _GLOBAL_OFFSET_TABLE_
w __gmon_start__
080484f0 r __GNU_EH_FRAME_HDR
080482a8 T _init
08049f10 t __init_array_end
08049f0c t __init_array_start
080484dc R _IO_stdin_used
080484c0 T __libc_csu_fini
08048460 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
08048426 T main
U puts@@GLIBC_2.0
080483b0 t register_tm_clones
08048310 T _start
0804a01c D __TMC_END__
08048360 T __x86.get_pc_thunk.bx

The nm command displays three columns:

  • the symbol's address
  • the section and type where the symbol is located
  • the symbol's name

A symbol is the name of a global variable or function. It is used by the linker to make connections between different object modules. Symbols are not necessary for executables, which is why executables can be stripped.

The symbol's address is actually the offset within a section for object files and is the effective address for executables.

The second column specifies the section and type of the symbol. If it is uppercase, then the symbol is exported and can be used by another module. If it is lowercase, then the symbol is not exported and is local to the object module, making it unusable in other modules. Thus:

  • d: the symbol is in the initialized data area (.data), unexported
  • D: the symbol is in the initialized data area (.data), exported
  • t: the symbol is in the code area (.text), unexported
  • T: the symbol is in the code area (.text), exported
  • r: the symbol is in the read-only data area (.rodata), unexported
  • R: the symbol is in the read-only data area (.rodata), exported
  • b: the symbol is in the uninitialized data area (.bss), unexported
  • B: the symbol is in the uninitialized data area (.bss), exported
  • U: the symbol is undefined (it is used in the current module but defined in another module)

More information can be found in the manual page for the nm utility.

Using the objdump command, we can disassemble the code of object files and executables. This way, we can see the assembly code and how the modules operate.

The readelf command is used to inspect object or executable files. With the readelf command, we can view the headers of files. An important piece of information in the header of executable files is the entry point, the address of the first instruction executed:

$ readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048310
Start of program headers: 52 (bytes into file)
Start of section headers: 8076 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 35
Section header string table index: 34

Using the readelf command, we can see the sections of an executable/object file:

$ readelf -S hello
There are 35 section headers, starting at offset 0x1f8c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048154 000154 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048168 000168 000020 00 A 0 0 4
[ 3] .note.gnu.build-i NOTE 08048188 000188 000024 00 A 0 0 4
[...]

Also, with the readelf command, we can list (dump) the contents of a specific section:

$ readelf -x .rodata hello

Hex dump of section '.rodata':
0x080484d8 03000000 01000200 48656c6c 6f2c2057 ........Hello, W
0x080484e8 6f726c64 2100 orld!.

Guide: Linking C and C++

Access the directory drills/tasks/cpp-obs/support/. We want to observe how linking is performed with mixed sources: C and C++.

In the bad/ subdirectory, we have two directories, c-calls-cpp/ and cpp-calls-c/, where we combine C and C++ sources. In both cases, using make displays errors. This occurs because C++ symbols are mangled.

If we use the nm command on object modules obtained from C source code, we get:

$ nm add.o
0000000000000000 T _Z3addii

$ nm sub.o
0000000000000000 T _Z3subii

The symbol names are not add and sub, but rather _Z3addii and _Z3subii. C++ symbol names are mangled and define the function signature. This allows for functions with the same name but different signatures. Details about name mangling can be found here.

To resolve this, symbols defined in C and imported into C++, or vice versa, must be prefixed with the directive extern "C". This way, the C++ compiler will use simple names for the imported/exported symbols, allowing them to be used together with C modules. This is implemented in the good/ subdirectory. Details about the extern "C" directive can be found here.