ITSE 3242:
Systems Programming
Program execution and translation
Objective
 Discuss Program execution steps.
 Understand what different phases of the program
  translation perform
 Understand different types of object codes and how they
  are related with program translation
 Understand how linking works
 Understand how loading works
Linking
 Step 1: Take text segment from each .o file and put them together.
 Step 2: Take data segment from each .o file, put them together, and
  concatenate this onto end of text segments.
 Step 3: Relocate and Resolve References
    Go through un-resolved references and resolve them
    Fill in all absolute addresses
           .o file 1
                                                  a.out
            text 1
            data 1                             Relocated text 1
            info 1                             Relocated text 2
                            Linker
                                               Relocated data 1
            .o file 2
                                               Relocated data 2
            text 2
            data 2
            info 2
Linker cont.…
.
Relocation
Relocation
Relocation
Relocation
Symbols Resolution
 To resolve references:
    search for reference (data or label) in all “user” symbol
     tables
    if not found, search library files (for example, for printf)
    once absolute address is determined, fill in the machine
     code appropriately
 In the context of a linker, there are three different kinds of
  symbols:
 Global symbols:
   That are defined by module m and that can be referenced by other
    modules.
   Global linker symbols correspond to:
      non-static C functions and
      global variables that are defined in the module.
Symbols Resolution
 Global symbols (External references) that are
  referenced by module m but defined by some
  other module.
   Such symbols are called externals
 Local symbols that are defined and referenced
  exclusively by module m.
   C functions and global variables that are defined with the
    static attribute.
Symbols
                           Symbol defined in your
#include <stdio.h>          program and can be used
int errno;                  elsewhere
                             errno and x
int x = 27;
                           Symbols defined and used
static y = 53;              only in this program.
                             y
int main () {            main
int sum = 23;            Symbol defined elsewhere
printf (“hello,world\n”); and used by your program
display();                    printf
                                display
}
Symbol Types
 Symbol definitions are stored (by the compiler) in a symbol
  table.
 Symbol table keeps track of symbols used in the program.
 The compiler exports each global symbol as either strong
  or weak
 Strong symbols:
    Functions
    Initialized global variables
 Weak symbols:
   Uninitialized global variables
Symbol Types
 Strong symbols:
   Functions
   Initialized global variables
 Weak symbols:
   Uninitialized global variables
Linker’s Symbol Rules
 Rule 1: Multiple strong symbols with the same name are
  not allowed in a single executable.
 Each item can be defined only once
 Otherwise: Linker error
 Question: What will happen if the two programs are linked
  together?
Linker’s Symbol Rules
 Rule 2: Given a strong symbol and multiple weak symbols
  with the same name, the linker chooses the strong symbol.
   References to the weak symbol resolve to the strong symbol
 Question: What will happen if the two programs are linked
  together and the program is executed?
Linker’s Symbol Rules
 Rule 2: Given a strong symbol and multiple weak symbol
  with the same name, the linker chooses the strong symbol
   References to the weak symbol resolve to the strong symbol
 Question: What will happen if the two programs are linked
  together and the program is executed?
Linker’s Symbol Rules
 Rule 3: If there are multiple weak symbols with the same
  name, the linker can pick an arbitrary one
Linking cont.….
 Avoid global variables if you can otherwise
    Use static if you can
    Initialize if you define a global variable
    Use extern if you use external global variable
 Static Variables
    In C, the keyword static affects the lifetime and linkage (visibility) of a variable
    A static global variable, declared at the top of a source file, is visible only within the
     source file.
    Linker will not resolve any reference from another object file to it
Packaging common Libraries
 How to package functions commonly used by programmers?
    Like printf, scanf, strcmp.
 Option 1: Put all functions in a single source file.
    Programmers link big object file into their programs: but is very inefficient.
                    gcc -o myprog myprog.o somebiglibraryfile.o
 Option 2: Put each routine in a separate object file.
    Programmers explicitly link appropriate object files into their programs
    but is a real pain to the programmer
                    gcc -o myprog myprog.o printf.o scanf.o strcmp.o .....
Packaging common Libraries
 Solution: Static libraries
 Combine multiple object files into a single archive file (file extension
  “.a”) bundled together.
 Linker can also take archive files as input: Linker searches the .o files
  within the .a file for needed references and links them into the
  executable.
                 gcc -o myprog myprog.o /usr/lib/libc.a
 We can create a static library file using the UNIX ar command
                 ar rs libc.a atoi.o printf.o random.o ...
Packaging common Libraries
 Commonly used static libraries
 libc.a (the C standard library)
    2.8 MB archive of 1400 object files.
    I/O, memory allocation, signal handling, string handling, data and time,
 Math libm.a (the C math library)
    0.5 MB archive of 400 object files.
    floating point math (sin, cos, tan, log, exp, sqrt, …)
 Static libraries have the following disadvantages:
    Lots of code duplication in the resulting executable files
    Every C program needs the standard C library.
 e.g., Every program calling printf() would have a copy of the printf()
  code in the executable. Very wasteful!
    OS would have to allocate memory for the standard C library routines being
     used by every running program!
    Any changes to system libraries would require relinking every binary!
Packaging common Libraries
 Solution: Shared libraries
 Libraries that are linked into an application dynamically,
 They are Object files that contain code and data that are loaded and
  linked into an application dynamically, at either load‐time or run‐time
    On UNIX, “.so” filename extension is used
    On Windows, “.dll” filename extension is used (dynamic link libraries)
 When the OS runs a program, it checks whether the executable was
  linked against any shared library (.so) files.
 If so, it performs the linking and loading of the shared
  libraries on the fly.
    Example: gcc -o myprog main.o /usr/lib/libc.so
 We can create our own shared libs using gcc -shared
    gcc -shared -o mylib.so main.o swap.o
Dynamic Linking
 Dynamic linking can occur when executable is first loaded
  and run (load time linking)
   Common case for Linux, handled automatically by the dynamic
    linker (ld-linux.so)
   Standard C library (libc.so) usually dynamically linked
 Dynamic linking can also occur after program has begun
  execution (run-time linking)
 Shared library routines can be shared by multiple
  processes.
Executable File Formats
 The system has a format by which it expects the code and data of a
  program to be laid out on disk, which we call an executable file format.
 Each system has its own file format, but the major ones that have been
  used are outlined here:
 a.out (Assembler OUTput) — the oldest UNIX format, but did not have
  adequate support for dynamic linking.
 COFF (Common Object File Format) — An older Unix format that is no
  longer used, but forms the basis for some other executable file formats
  used today.
 PE (Portable Executable) — The Windows executable format, which
  includes a COFF section as well as extensions to deal with dynamic linking
  and things like .net code.
 ELF (Executable and Linkable Format) — The modern Unix/Linux format.
 Mach-O — The Mac OSX format, based on the Mach research kernel
  developed at CMU in the mid-1980s.
Loading Files
 Input: Executable Code (e.g., a.out),
 Output: (program is run)
 Executable files are stored on disk. When one is run,
  loader’s job is to load it into memory and start its running.
 In reality, loader is the operating system (OS)
   loading is one of the OS tasks
Loading Files
 Functions of a loader
   Reads executable file’s header to determine size of text and data
    segments
   Creates new address space for program large enough to hold text
    and data segments, along with a stack segment
   Copies instructions and data from executable file into the new
    address space.
   Copies arguments passed to the program onto the stack
   Initializes machine registers
   Jumps to start-up routine (usually main) that copies program’s
    arguments from stack to registers & sets the PC
      If main routine returns, start-up routine terminates program with
        exit system call
Loading Files
Dynamic Loading
 Routine is not loaded until it is called
 Better memory-space utilization;
 unused routines are never loaded.
 Useful when large amounts of code are needed to handle
  infrequently occurring cases.