Building and Using a Cross Development Tool Chain
Robert Schiele
rschiele@uni-mannheim.de
Abstract 1 Motivation
1.1 Unix Standard System Installations
Although in recent years some Unix vendors
stopped shipping development tools with their
operating systems, it is still quite common on
When building ready-to-run applications from most systems to have a C compiler, an assem-
source, a compiler is not sufficient, but li- bler and a linker installed. Often system ad-
braries, an assembler, a linker, and eventually ministrators use these tools to compile applica-
some other tools are also needed. We call the tions for their systems when binary packages
whole set of these tools a development tool are not available for their platform or when the
chain. Building a native tool chain to build ap- setup of the binary package is not applicable to
plications for the compiler’s platform is well their local setup. For such scenarios, the sys-
documented and supported. As clusters be- tem compiler is quite sufficient.
come more and more widespread, it becomes
interesting for developers to use the enormous 1.2 Development Usage
CPU power of such a cluster to build their ap-
plications for various platforms by using cross
Although this so-called system compiler can
development tool chains.
also be used by a software developer to build
We describe how a development tool chain is the product he is developing on and is often
structured and which steps have to be taken by done, this is in most cases not the best solution.
its parts to build an executable from source.
There are several reasons for not using the sys-
We also evaluate whether the characteristics of
tem compiler for development:
each step imply that a special version of this
tool is needed for the cross development tool
chain. Furthermore, we explain what has to • In development you often have a large
be done to build a complete cross development number of development machines that
tool chain. This is more involved than building can be used in a compiler cluster to
a native tool chain, because intrinsic dependen- speed up compilation. Tools for this pur-
cies that exist between some parts of the tool pose are available, as distcc by Mar-
chain must be explicitly resolved. Finally, we tin Pool, ppmake from Stephan Zimmer-
also show how such a cross compiler is used mann with some improvements from my
and how it can be integrated into a build envi- side, or many other tools that do simi-
ronment on a heterogeneous Linux/Unix clus- lar things. The problem is that when us-
ter. ing the system compiler, you can only use
214 • GCC Developers Summit
other development machines that are of characteristics require them to be handled spe-
the same architecture and operating sys- cially when used in a cross development tool
tem because you cannot mix up object chain. In section 3, we will show what must
files generated for different platforms. be done to build a complete cross development
tool chain and what are some tricks to work
• As a developer, you normally want to sup- around some problems. In section 4, we show
port multiple platforms, but in most cases, how to integrate the cross development tool
you have a large number of fast machines chain into build systems to gain a more effi-
for one platform, but only a few slow cient development tool chain. Finally, we will
machines for another one. If you used find some conclusions on our thoughts in the
only the system compiler in that case, you last section.
would end up in long compilation times
for those platforms where you only have a
few slow machines. 2 How a Compiler Works
• Last but not least, you often also want To understand how a compiler works and thus
to build for a different glibc release what we have to set up for a cross compiler,
etc. than the one installed on your sys- we need to have a look at the C development
tem for compatibility reasons. This is also tool chain. This is normally not a monolithic
not possible for all cases with a system tool that is fed by C sources and produces exe-
compiler pre-configured for your system’s cutables, but consists of a chain of tools, where
binutils release and other system specific each of these tools executes a specific transfor-
parameters. mation. An overview of this tool chain can be
found in Figure 1. In the following, I will show
1.3 Compiling for a Foreign Platform those parts and explain what they do.
This section is not intended to provide a com-
We can solve all those problems by making plete overview on compiler technology, but
clear to ourselves that a compiler does not nec- does only discuss some principles that help
essarily have to build binaries for the platform us to understand why cross development tool
it is running on. A compiler where this is the chains work the way they do. If you would
case, like the system compiler, is called a na- like to have some detailed information about
tive compiler. Otherwise, the compiler is called compiler technology, I recommend reading the
a cross compiler. so-called Dragon book [ASU86].
We also need a cross compiler for bootstrap-
2.1 The C Preprocessor
ping a new platform that does not already ship
a compiler to bootstrap a system with. But this
cannot really be a motivation for this paper, as The C preprocessor is quite a simple tool. It
people that bootstrap systems most likely do just removes all comments from the source
not need the information contained in this pa- code and processes all commands that have
per to build a cross development tool chain. a hash mark (#) on the first column of
any lines. This means, for example, it in-
In the following section we will show some cludes header files at the position where we
basic principles of a development toolchain, placed #include directives, it does condi-
how the single parts work and whether their tional compiling on behalf of #if. . . direc-
GCC Developers Summit 2003 • 215
C source file tual C compiler for performance reasons and to
solve some data flow issues. Because of these
C preprocessor (cpp) reasons, the C preprocessor is actually not re-
? ally platform-independent.
C preprocessed source file
2.2 The C Compiler
C compiler (cc1) — frontend
?
intermediate language The actual C compiler is responsible for trans-
forming the preprocessed C source code to as-
C compiler (cc1) — backend sembler code that can be further processed by
? the assembler tool. Some compilers have an in-
assembler file tegrated assembler, i.e. they bypass the assem-
bler source code, but compile directly to binary
assembler (as) object code.
?
object file We can divide the compiler into a front end and
a back end, but you should note that in most
linker (ld) cases these two parts are integrated into one
? tool.
executable
Figure 1: tool chain 2.2.1 The Compiler Front End
The front end is responsible for transforming
tives and expands all macros used within the C the C source code to some proprietary inter-
source code. The output of the C preprocessor mediate language. This intermediate language
is again C source code, but without comments should be ideally designed to be independent
and without any preprocessor directive. of both the source language and the destina-
tion platform to allow easy replacements of the
Note that most programming languages other
front end and the back end. Because of that
than C do not have a preprocessor. It should
reason the front end is independent of the des-
be noted that preprocessor directives and espe-
tination platform.
cially macros make some hackers to produce
really ugly code, but in general, it is a quite
useful tool.
2.2.2 The Compiler Back End
It can easily be seen that the C preproces-
sor itself should not be platform dependent, The back end does the translation of the in-
as it is a simple C-to-C-translator. But in termediate language representation to assem-
fact, on most systems the preprocessor defines bler code. As the assembler code is obviously
platform-specific macros like e.g. __i386__ platform-dependent, the back end is as well.
on an ia32 architecture, and it must be con-
figured to include the correct platform specific This results in the fact that although the front
header files. Apart from that, in many compil- end is platform-independent, the whole C com-
ers the preprocessor is integrated into the ac- piler is not because it is an integration of both
216 • GCC Developers Summit
the front end and the back end, where the latter 3 Building the tool chain
is not independent.
2.3 The Assembler As we now have some basic knowledge about
how a development tool chain is structured, we
can start building our cross development tool
The assembler is the tool that translates assem- chain. We can find both the C preprocessor and
bler code to relocatable binary object code. Re- compiler in the gcc package [GCC], which is
locatable means that there are no absolute ad- the most commonly used compiler for Linux
dresses built into the object code, but instead, and for many other Unix and Unix-like plat-
if an absolute address is necessary, there are forms.
markers that will be replaced with the actual
address by the linker. The object code files in- We use the assembler and linker from the
clude a table of exported symbols that can be GNU binutils package [Bin]. As an alterna-
used by other object files, and undefined sym- tive linker for ELF platforms, there is the one
bols that require definition in a different object from the elfutils by Ulrich Drepper, but this one
file. As both the input and the output of this is in a very early point in its life cycle, and
tool is platform-specific, the assembler obvi- I would not currently recommend using these
ously depends on the platform it should gen- tools for a productive environment. For the
erate code for. GNU assembler, there are also various alterna-
tives available, but as changing an assembler
does only a straightforward translation job and
2.4 The Linker thus, no improvements of the results are to be
expected, it is not worth integrating another as-
sembler into the tool chain.
The linker can be considered the final part in
the development tool chain. It puts all binary These are all tools for our tool chain, but we
object code files together to one file, replac- are still missing something: As every C appli-
ing the markers by absolute addresses and link- cation uses functions from the C library, we
ing function calls or symbol access to other ob- need a C library for the destination platform.
ject files to the actual definition of the symbol. We will use glibc [Gli] here. If we wanted
Some of those object files might be fetched to link our applications to additional libraries,
from external libraries, for example the C li- we would need them also, but we will skip this
brary. We do not explain how linking to shared part here. The essential support libraries for
objects works, as it just makes things a bit more other gcc supported languages like C++ are
complicated, but does not make a real differ- shipped and thus built with gcc anyway.
ence on the principles that are necessary to un-
derstand the development tool chain. The re- The following examples are for building a
sult of this tool is normally an executable. For cross development tool chain for a Linux sys-
the same reasons as with the assembler, the tem with glibc on a PowerPC. The cross
linker clearly depends on the destination plat- compiler is built and will run itself on a Linux
form. system on an ia32 architecture processor. Al-
though something might be different for other
More detailed information on the principles of system combinations, the principles are the
linkers can be found in [Lev00]. same.
GCC Developers Summit 2003 • 217
3.1 The Binutils As long as there is not a hard bug in the used
binutils package, this step is quite unlikely to
The simplest thing to start with is the binutils fail, as there are no dependencies to other tools
package because they neither depend on the of the tool chain we build. For the follow-
gcc compiler nor on the glibc of the des- ing parts we should expect some trouble be-
tination platform. And we need them anyway cause of intrinsic dependencies between gcc
when we want to build object files for the des- and glibc.
tination platform, which is obviously done for
From this point on, we should add the bin/
the glibc, but even gcc provides a library
directory from our installation directory into
with some primitive functionality for some op-
$PATH, as the following steps will need the
erations that are too complex for the destina-
tools installed here.
tion platform processor to execute directly.
From a global point of view we have depen- 3.2 A Simple C Compiler
dencies between the three packages as shown
in figure 2. Now we run into the ugly part of the story:
We need a C library. To build it, we obvi-
binutils ously need a C compiler. The problem is now
@
I that gcc ships with a library (libgcc) that in
@
@ some configurations depends on parts of the C
@ library.
gcc
-
glibc
For this reason, I recommend building the C
Figure 2: Dependencies between the packages library and all the other libraries on a native
system and copying the binaries to the cross
So we fetch a binutils package, unpack it compiler tool chain or using pre-built binaries,
and create a build directory somewhere—it if possible. If you build a cross compiler that
is recommended not to build in the source compiles code for a commercial platform like
directory—where we then call Solaris, you have to do so anyway, as you nor-
mally do not have the option to compile the
../binutils-2.13.90.0.20/configure Solaris libc on your own. If you decide to
--prefix=/local/cross
build the C library with your cross compiler,
--enable-shared
continue here, otherwise skip to building the
--host=i486-suse-linux
full-featured compiler.
--target=powerpc-linux
binutils gcc
We set the prefix to the directory we want 6 YHH
the cross development tool chain to be in- H
HH
stalled into, we enable shared object support, H
HH ?
as we want that on current systems and we tell simple gcc -
glibc
configure the host platform, i.e. the plat-
form the tools are running on later, and the Figure 3: Dependencies with simple C com-
target platform, i.e. the platform for which piler
code should be generated by the tools later.
Afterwards, we run a quick make, make We cannot build a full-featured compiler now,
install, and the binutils are done. as the runtime libraries obviously depend on
218 • GCC Developers Summit
the C library. This cycle in the dependency files to the destination directory, by removing
graph can be seen in figure 2. We can resolve the failing parts from the makefiles and contin-
this cycle by introducing a simple C compiler uing the build afterwards, or by just touching
that does not ship these additional libraries, so the files that fail to build. The last option forces
that we get dependencies as shown in figure make to silently build and install corrupted li-
3. But because of the reason mentioned above, braries, but if we have this in mind, this is not
for most configurations we cannot even build a really problematic, as we can just rebuild the
simple C only compiler. That means we can whole thing later and thus replace the broken
build the compiler itself, but the support li- parts with sane ones.
braries might fail. So we just start by doing
The simplest way of installing an incomplete
compiler when using GNU make is calling
CFLAGS="-O2 -Dinhibit_libc" make and make install with the addi-
../gcc-3.2.3/configure tional parameter -k so that make automati-
--enable-languages=c cally continues on errors. This will then just
--prefix=/local/cross skip the failing parts, i.e. the support libraries.
--target=powerpc-linux
--disable-nls
3.3 The C Library
--disable-multilib
--disable-shared
--enable-threads=single After having built a simple C compiler, we can
build the C library. It has already been said that
and then starting the actual build with make. this might be necessary to be part of an iterative
The configure command disables just ev- build process together with the compiler itself.
erything that is not absolutely necessary for To build the glibc we also need some ker-
building the C library in order to limit the pos- nel headers, so we unpack the kernel sources
sible problems to a minimum amount. Some- somewhere and do some basic configuration by
times it also helps to set the inhibit_libc typing
macro to tell the compiler that there is no libc
yet, so we add this also. In case the build com-
pletes without an error, we are lucky and can make ARCH=ppc symlinks
just continue with building the C library after include/linux/version.h
doing a make install before.
Otherwise, we must install the incomplete Now we configure by
compiler. In this case, the compiler will most
likely not be sufficient to build all parts of the
C library, but it should be sufficient to build the ../glibc-2.3.2/configure
major parts of it, and with those we might be --host=powerpc-linux
able to recompile a complete simple C com- --build=i486-suse-linux
piler. We have to iterate between building this --prefix=
compiler and the C library, until at least the C /local/cross/powerpc-linux
library is complete. --with-headers=
/local/linux/include
The installation of an incomplete package can --disable-profile
be either done by manually copying the built --enable-add-ons
GCC Developers Summit 2003 • 219
and do the usual make and make install 3.4 A Full-featured Compiler
stuff.
After we have a complete C library, we can
Note that the -host parameter is different
build the full-featured compiler. That means
here to the tools, as the glibc should actu-
we do now again a rebuild of the compiler,
ally run on the target platform and not, like the
but with all languages and runtime libraries we
tools, on the build host. The -prefix is also
want to have included.
different, as the glibc has to be placed into
the target specific subdirectory within the in- With a complete C library, this would be no
stallation directory, and not directly into the problem any more, so we should manage to do
installation directory. Additionally, we have this by just typing
to tell configure where to find the ker-
nel headers and that we do not need profil- ../gcc-3.2.3/configure
ing support, but we want the add-ons like --enable-languages=
linuxthreads enabled. c,c++,f77,objc
In case that building the full glibc fails be- --prefix=/local/cross
cause building the C Compiler was incomplete --disable-libgcj
before, the same hints for installing the in- --with-gxx-include-dir=
complete library apply that where explained /local/cross/include/g++
for the incomplete compiler. Additionally, it --with-system-zlib
might help to touch the file powerpc-linux/ --enable-shared
include/gnu/stubs.h within the installa- --enable-__cxa_atexit
tion directory, in case it does not exist yet. This --target=powerpc-linux
file does not contain important information for
building the simple C compiler, but for some and again doing the build and installation by
platforms it is just necessary to be there be- make and make install.
cause other files used during the build include
it. 4 Using the Tool Chain on a Clus-
After installation of the glibc (even the ter
incomplete one), we also have to install
the kernel headers manually by copying We now have a full-featured cross develop-
include/linux to powerpc-linux/ ment tool chain. We can use these tools by
include/linux within the installa- just putting the bin/ path where we installed
tion directory and include/asm-ppc to them to the system’s search path and calling
powerpc-linux/include/asm. The latest them by the tool name with the platform name
kernels also want include/asm-generic prefixed, e.g. for calling gcc as a cross com-
to be copied to powerpc-linux/include/ piler for platform powerpc-linux, we call
asm-generic. Other systems than Linux powerpc-linux-gcc. The tools should
might have similar requirements. behave in the same way the native tools on the
host system do, except that they produce code
for a different platform.
But our plan was to use the cross compiler on a
cluster to speed up compilation of large appli-
220 • GCC Developers Summit
cations. There are various methods for doing CVS head revision replaced ppmconnect by
so. In the following we will show two of them. the integrated binary ppmake.
4.1 Using a Parallel Virtual Machine (PVM) There is also a script provided in the package
that does most of these things automatically,
but I do not like the way this script handles the
We receive most scalability by dispatching all
process, so I do not use it personally, and such
jobs that produce some workload to the nodes
it is a bit out of date recently.
in the cluster. make is a wonderful tool to do
so. A long time ago, Stephan Zimmermann Note that there is a similar project [PVMb] by
implemented a tool called ppmake that be- Jean Labrousse ongoing which aims at in in-
haved like a simple shell that distributed the tegrating a similar functionality directly into
commands to execute on the nodes of a cluster GNU make. You may want to consider look-
based on PVM. He stopped the development of ing at this project also.
the tool in 1997. As I wanted to have some im-
provements for the tool, I agreed with him to You should note that it is necessary for this ap-
put the tool under GPL and started to imple- proach that all files used in the build process
ment some improvements. You can fetch the are available on the whole cluster within a ho-
current development state from [ppm], but note mogenous file system structure, for example
that the documentation is really out of date and by placing them on a NFS server and mount-
that I also stopped further development for sev- ing on all nodes at the same place. Addition-
eral reasons. ally, it is necessary that all commands used
within the makefiles behave in the same way
If you want to use this tool, you just have to on all nodes of the cluster. Otherwise, you
fetch the package, build it and tell make to will get random results, which is most likely
use this shell instead of the standard /bin/sh not what you want. This means you should
shell by setting the make variable SHELL to always call the platform-specific compiler ex-
the ppmake executable. Obviously you have plicitly, e.g. by powerpc-linux-gcc in-
to set up a PVM cluster before make this work. stead of gcc, and the same releases of the com-
Information on how to set up a PVM cluster piler, the linker and the libraries should be in-
can be found at [PVMa]. To gain something stalled on all nodes.
from your cluster you should also do parallel
builds by specifying the parameter -j on the
4.2 Using with distcc
make command line.
For example, if you had a cluster consisting of
The biggest disadvantage of the method de-
42 nodes configured in your PVM software and
scribed above is that it relies on central file
ppmake installed in /usr/, you call
storage and on identical library installations on
all nodes. You can prevent these constraints
make -j 42 at the cost of limiting the amount of work-
SHELL=/usr/bin/ppmconnect load that will be distributed among the nodes in
... the cluster to the compilation and assembling
step. Preprocessing and linking is done directly
instead of just on the system where the build process was
started and thus not parallelized. Only compi-
make ... lation jobs are parallelized, all other commands
GCC Developers Summit 2003 • 221
are directly executed on the system, where the At least if you have an amount of systems for
build process was invoked. Although this lim- office jobs idling almost all of their time, it is
its the amount of workload that really runs in worth investing some time for building up such
parallel, this is in most cases not a real prob- an infrastructure to use their CPU power for
lem, as most build processes spend most of your build processes.
their time with compilation anyway.
As this is a tutorial paper, its contents are
The advantage of this approach is that you only intended for people that do not have exten-
need to have the cross compiler and assem- sive konwledge on the topic described to help
bler on each node. Include files and libraries them understanding it. If you think something
are necessary only on the system on which the is unclear, some information should be added
build is invoked. or you find an error, please send a mail to
rschiele@uni-mannheim.de.
Such an approach is implemented in Martin
Pool’s distcc package [dis]. This tool is a
replacement for the gcc compiler driver. Pre- References
processing and linking is done almost in the
same way the standard compiler driver does, [ASU86] A.V. Aho, R. Sethi, and J.D. Ullman.
but the actual compile and assemble jobs are Compilers: Principles, Techniques,
distributed among various nodes on the net- and Tools. Addison-Wesley, Read-
work. ing, MA, 1986.
Although this solution obviously gives not the [Bin] GNU Binutils.
same amount of scalability, as not all jobs can http://sources.redhat.
be parallelized, it is for most situations a better com/binutils/.
solution, as from my experience it seems that
many system administrators are not capable of [dis] distcc: a fast, free distributed C and
installing a homogenous build environment on C++ compiler. http://distcc.
a cluster of systems. samba.org/.
[GCC] GCC Home Page—GNU Project—
5 Conclusion Free Software Foundation (FSF).
http://gcc.gnu.org/.
Finally, we can conclude that it is not really dif-
[Gli] GNU libc.
ficult to build and use a cross development tool
http://sources.redhat.
chain, but in most cases, building the whole
com/glibc/.
tool chain is not as simple as described in
the compiler’s documentation because building [Lev00] John R. Levine. Linkers and Load-
cross development tool chains is not as well ers. Morgan Kaufmann Publishers,
tested as building native tool chains are. Thus, 340 Pine Street, Sixth Floor, San
you should expect numerous minor bugs in the Francisco, CA 94104-3205, 2000.
code and in the build environment. But with
some basic knowledge about how such a sys- [ppm] SourceForge.net: Project Info—
tem works and, thus, what the source of those PVM Parallel Make (ppmake).
problems is, in most cases they can be easily http://sourceforge.net/
fixed or worked around. projects/ppmake/.
222 • GCC Developers Summit
[PVMa] PVM: Parallel Virtual Machine.
http://www.epm.oml.gov/
pvm/.
[PVMb] PVMGmake. http:
//pvmgmake.sourceforge.
net/.