Symbol Resolution and Relocation
Symbol Resolution and Relocation
The essentials of MCLinker
Symbol Resolution
Relocation
Performing section merge
Resolving all resolvable relocation
Replacing symbolic references with
actual addresses (binding)
Resolving references across
symbols
Merging multiple symbol tables into
one
MCLinker
MCLinker
Bitcode
Bitcode
Object
Object file
file
Archive
Archive
Bitcode
Bitcode
reader
reader
Object
Object
reader
reader
Archive
Archive
reader
reader
Symbol
Symboltable
table
MCLDFile
MCLDFile
MCLDFile
MCLDFile
MCLDFile
MCLDFile
Symbol
table
Symbol
Symboltable
table
Symbol
table
Symbol
Symboltable
table
Section a
Section b
Section
aaaaa
Section
Section
Section
Section
Section
a
Section
Section c
Section
bbbbb
Section
Section
Section
Section
Section
b
Section
Section
ccccc
Section
Section
Section
Section
Section
c
Section
Relocation
Relocation
Relocation
Relocation
Relocation
Relocation
Relocation
class
Relocation
Relocation
Data
http://code.google.com/p/mclinker
Symbol
Resolution
Output
Output
11/11/18
MCLinker
MCLDFile
Instances of MCLDFile are the inputs and outputs of MCLinker
MCLDFile provides a consistent abstraction of object files in a variety of targets
and formats, it has:
symbol tables
sections
relocation entries
Linking operations on MCLDFile are efficient
Looking up symbols is fast with limited memory
By memory map I/O, usage of physical memory keeps as few as possible
By loading on demand, usage of virtual memory keeps as few as possible
http://code.google.com/p/mclinker
11/11/18
MCLinker
Symbol Table In MCLinker (1/2)
MCLinker avoids copying symbols between symbol tables
Symbol tables record only references of symbols, not instances.
A common symbol pool stores all symbol instances of different symbol
tables
MCLinker keeps the number of walks over symbol tables as few as possible
MCLinker prevents merging symbol tables from symbol resolution
MCLinker resolves symbols simultaneously when reading symbol
tables of inputs
MCLinker only visits symbols which it needs
Dynamic and common symbols are grouped into different sets
http://code.google.com/p/mclinker
11/11/18
MCLinker
Symbol Table In MCLinker (2/2)
Store only references
of symbols
Group symbols into
different categories
Symbol Table 1
Dynamic
Symbol Table 2
Common
Non-Dynamic
Reference symbols
Common
Define symbols
Common Symbol Pool
Resolve symbols when
add a symbol into the
common symbol pool
http://code.google.com/p/mclinker
Common symbol pool
records the real instances
of symbols
11/11/18
MCLinker
Symbol in MCLinker
MCLinker defines a format-independent abstraction of
symbols, aka LDSymbol
Supports Mach-O, ELF, and COFF
Supports both 32- and 64-bit
MCLinker transforms symbols of different formats into
LDSymbol as the following figure:
LDSymbol
LDSymbol
name
name
is_dyn
is_dyn :: 11
type
type :: 22
bind
bind :: 22
in_section
in_section
value
value :: 64
64
size
size :: 64
64
other
other :: 88
ELF
ELF Symbol
Symbol
st_name
st_name
st_info
st_info
st_shndx
st_shndx
st_value
st_value
st_size
st_size
st_other
st_other
http://code.google.com/p/mclinker
11/11/18
MachO
MachO Nlist
Nlist
n_un
n_un
n_type
n_type
n_desc
n_desc
n_sect
n_sect
n_value
n_value
COFF
COFF Symbol
Symbol
Name
Name
Type
Type
StorageClass
StorageClass
SectionNum
SectionNum
Value
Value
NumAux
NumAux
MCLinker
Symbol Resolution
Steps
Get a input symbol from an input file
If no symbols in output symbol table have the same name ,
then add input symbol to output symbol table
Otherwise, compare input symbol with the existing output symbol
according to Table 1.
Discard the input symbol or override the output symbol by the result of
comparison
Table 1. - The priorities of attribute values in symbol comparison
Attributes
Priority of attribute values
is_dyn
not a dynamic object > is a dynamic object
type
defined > common > reference
bind
global > weak
http://code.google.com/p/mclinker
11/11/18
MCLinker
Sections in MCLinker
MCLDFile reuses the definitions of sections in LLVM machine code (MC) layer
MCSection has the attributes (name, type, ) of a section
MCSectionData records the size and offset of a section
MCFragment is the storage of data
Readers in MCLinker transform only the sections holding the information
defined by the program into MCSection
In general, readers transforms only text and data sections
In ELF, readers transforms only sections with SHT_PROGBITS and
SHF_ALLOC attributes
http://code.google.com/p/mclinker
11/11/18
MCLinker
Relocation Entries in MCLinker
MCLinker defines a format-independent relocation called
LDRelocation
Support Mach-O, ELF, and COFF
As BFD, MCLinker uses a target-independent relocation algorithm for
all targets
LDRelocation has a target-independent data structure howto to
describe how to apply relocation
Relief the porting efforts from implementing various relocation functions
for all targets
howto
howto
TargetBackend additionally
provides target-dependent
relocation functions to improve
performance as needed
http://code.google.com/p/mclinker
11/11/18
LDRelocation
LDRelocation
symbol
symbol
offset
offset
addend
addend
howto
howto
type
type
right_shift
right_shift
size
size
bit_size
bit_size
pcrel
pcrel
bit_position
bit_position
overflow
overflow
target_callback
target_callback
src_mask
src_mask
dst_mask
dst_mask
pcrel_offset
pcrel_offset
MCLinker
Applying Relocations by howto
Steps
1. Compute the relocation value
Relocation = S + A P
S : the value of the symbol
A : the value of addend
P : the value derived from offset
2. Shift the relocation value by shiftright (>>=) and bitpos (<<=)
3. Apply relocation value (As the following figure)
4. Write back the final result into the address of the symbol
SUB R0, R1, #1024
Rn
Rd
Rotate
Immed 8
11100010010000010000111100111111
and
and
result high
Rn
Rd Rotate
Immed 8
11100010010000010000111111111111
~dst_mask
11111111111111111111111100000000
111000100100000100001111
final result
http://code.google.com/p/mclinker
sum
src_mask
11111111
final value of relocation ( offset + addend + symbol address )
11111111111111111111110000000100
and
dst_mask
11111111
result low
01000011
11100010010000010000111101000011
11/11/18
10
MCLinker
Memory Allocation Policy in MCLinker
MCLinker has its own memory allocator, as called as MemoryArea
Unfortunately, We do not directly use LLVM MemoryBuffer
Linkers' demands of memory allocation policy is different from
compilers'
The average size of object files is different from source files
Linkers have more file operations than compilers. Linkers'
performance is more sensitive to the usage of memory mapped I/O
LLVM MemoryBuffer is designed for compilers, not linkers
Average size of all members in libc.a is less than and closed to one page
However, LLVM MemoryBuffer uses memory mapped I/O only when the
request is larger than four pages
Policy
Advantage
Disadvantage
Memory Mapped I/O
mmap()
~x2 faster file copy
1. Start address must be on the
page boundaries
2. Memory size must be a
multiple of the page size
Dynamic Memory
No constraints on either the start Slow file copy
http://code.google.com/p/mclinker
11/11/18
11
malloc()
+ read()
address
or the requested
size
MCLinker
Components of MemoryArea (1/2)
Three layers of MemoryArea
MemoryArea
MemorySpace
Clients request MemoryArea for virtual memory space
MemoryArea creates MemorySpaces and MemoryRegions to satisfy
clients' requests
MemoryArea decides whether to use dynamic memory or memory
mapped I/O
MemorySpace is a container of a non-overlapped and continuous range of
virtual addresses
Virtual memory is allocated by either malloc or memory mapped I/O
Clients do not directly access memory by MemorySpace. Instead, they access
memory by MemoryRegions
MemoryRegion
MemoryRegion marks a range of virtual memory space in a MemorySpace
Clients access memory through MemoryRegions
Several MemoryRegions can map to the identical MemorySpace
http://code.google.com/p/mclinker
11/11/18
12
MCLinker
Components of MemoryArea (2/2)
Every MemoryRegion maps
to a MemorySpace
LDObjectReader
LDObjectReader
Reads or Writes memory
MemoryRegion
MemoryRegion
MemoryRegion
MemoryArea
MemorySpace
MemorySpace
MemoryRegion
MemoryRegion
MemorySpace
MemorySpace
MemoryRegion
MemoryRegion
MemorySpace
MemorySpace
The MemorySpace mapped
by MemoryRegion may be
overlapped
A file in secondary storage
Copy parts of a file to a MemoySpace
by memory mapped I/O or dynamic
memory
http://code.google.com/p/mclinker
11/11/18
mmap
mmap
dynamic
dynamicmemory
memory
13
MCLinker
How MemoryArea allocates memory?
1. Requests memory space
with the specified size
LDObjectReader
LDObjectReader
MemoryRegion
4. Reader reads and writes memory
only through MemoryRegion
3. Map a MemorySpace to
a MemoryRegion
MemoryArea MemorySpace
2. Decides memory allocation policy
by size and threshold as Table 2.
22ndnd storage
storage
Table 2.
Request
RequestSize
Size>=
>=Threshold
Threshold
Request
RequestSize
Size<<Threshold
Threshold
Memory
MemoryPolicy
Policy
Using
Usingmemory
memorymapped
mappedI/O
I/O
Allocating
Allocatingdynamic
dynamicmemory
memory
Allocated
AllocatedMemorySpace
MemorySpaceSize
Size
Page
Pagealignment
alignment
As
Asrequested
requestedsize
size
Feature
Feature
Fast
Fastmemory
memoryread
readand
andwrite
write
Reducing
Reducingmemory
memoryfragments
fragments
http://code.google.com/p/mclinker
11/11/18
14
MCLinker