ccg
is an aid to the development and implementation of dynamic
code generators. The main components are a preprocessor (the program
that is actually called ``ccg
'') and a collection of ``runtime
assemblers'' (a set of header files whose names are utterly
unimportant).
The preprocessor accepts as input a C
program that contains
embedded dynamic code sections. (For the sake of brevity we will
consider ``C
'' to cover both C
and C++
.) Each dynamic
code section is a sequence of assembly language statements. The
output from the preprocessor is an equivalent C
program in which
the dynamic code sections have been replaced with (pure ANSI) C
code that generates the corresponding binary instructions at
runtime.
The majority of code generation work is performed by a runtime
assembler. Each runtime assembler is implemented as a set of cpp
macros in a header file specific to the target platform. Since the
entire assembler is implemented in macros (on which the compiler
can perform constant folding), the majority of instructions are
assembled completely at (static) compile time. They are reduced to
nothing more elaborate than sequences of ``move #constant,
address
'' in the compiled program.
On the other hand, since the runtime assemblers are also
complete, and presented to the client program as calls on
cpp
macros in which each operand is passed as an integer
argument, the client program has total freedom in specializing all
aspects of the generated code: literal constants, registers,
branch/jump displacements, the various elements of more complex
addressing modes, and so on, can all be computed by the program
during code generation. Moreover, the client program only pays a
performance penality for those operands which are not constant at
compile time (and which therefore cannot be ``constant folded out of
existence'' by the compiler).
The targets currently supported are:
Since the entire ccg
system is implemented in ANSI C
, and
the output from the preprocessor is also ANSI, it can be used on a
wide variety of platforms. Most notably it is suitable for use on
platforms where other common ``solutions'' to dynamic code generation
(such as the use of inline asm
and many related gcc
-specific
extensions) are not available, e.g. MacOS.
ccg
works: a quick appetiserThe programmer includes assembly language fragments in the high-level source program. For example:
movl $42, %eax
The preprocessor converts these statements into calls on macros that are implemented in the runtime assembler for the target platform (which is not necessarily the same as the host platform). After preprocessing, the above statement is transformed into:
MOVLir (42,_EAX);
The runtime assembler provides the definitions of the macro
MOVLir()
and the constant _EAX
(representing the register
``%eax
''). Since both arguments to the MOVLir()
macro are
constants, the compiler can fold them away entirely and produce ``code
generation code'' for the above statement that will be something
similar to this:
*(char *)(someAddress) = 0xB8; /* insn: move imm32 -> %eax */ *(long *)(someAddress + 1) = 0x2A; /* constant: 42 */
This is the code that is actually compiled into the client program. The final stage of assembly (the placing of binary code into memory) happens at runtime, at the moment when the above statements are executed.
(The insatiably curious can run ccg
without command-line
arguments on the programs in the ccg/examples
directory to
see how the preprocessor modifies the source code. The runtime
assemblers are in the ccg/asm-arch.h
files and should be
easily comprehensible. Nevertheless, begin by looking at the PowerPC
and Sparc assemblers files before making any attempt to understand the
assembler for the Pentium!)
ccg
is obviously not platform-independent. A given program file
can generate code that runs on one architecture only, and includes
assembly language statements for that architecture. Portability of
dynamic code generation is not the goal of ccg
, although the
support for portability across compilers and operating systems for a
given processor architecture is one of the major goals.
ccg
was developed to support the very lowest (platform-dependent)
layer in a portable, optimising, dynamic code generator. This system
performs the majority of the code generation and optimisation work
using a platform-independent ``intermediate'' representation, and then
calls a pluggable ``back end'' (implementing a platform-independent
interface) to convert the intermediate form into machine instructions.
We have found that porting the back-end to a new platform is
considerably easier with a tool such as ccg
, and requires only a
day or two of work.