Virtual Virtual Machine

1. Introduction

ccg is an aid to the development and implementation of dynamic code generators. The main components are a preprocessor (the program that is actually called ``ccg'') and a collection of ``runtime assemblers'' (a set of header files whose names are utterly unimportant).

1.1 Overview

The preprocessor accepts as input a C program that contains embedded dynamic code sections. (For the sake of brevity we will consider ``C'' to cover both C and C++.) Each dynamic code section is a sequence of assembly language statements. The output from the preprocessor is an equivalent C program in which the dynamic code sections have been replaced with (pure ANSI) C code that generates the corresponding binary instructions at runtime.

The majority of code generation work is performed by a runtime assembler. Each runtime assembler is implemented as a set of cpp macros in a header file specific to the target platform. Since the entire assembler is implemented in macros (on which the compiler can perform constant folding), the majority of instructions are assembled completely at (static) compile time. They are reduced to nothing more elaborate than sequences of ``move #constant, address'' in the compiled program.

On the other hand, since the runtime assemblers are also complete, and presented to the client program as calls on cpp macros in which each operand is passed as an integer argument, the client program has total freedom in specializing all aspects of the generated code: literal constants, registers, branch/jump displacements, the various elements of more complex addressing modes, and so on, can all be computed by the program during code generation. Moreover, the client program only pays a performance penality for those operands which are not constant at compile time (and which therefore cannot be ``constant folded out of existence'' by the compiler).

The targets currently supported are:

PowerPC
Sparc
Pentium

Since the entire ccg system is implemented in ANSI C, and the output from the preprocessor is also ANSI, it can be used on a wide variety of platforms. Most notably it is suitable for use on platforms where other common ``solutions'' to dynamic code generation (such as the use of inline asm and many related gcc-specific extensions) are not available, e.g. MacOS.

1.2 How `ccg` works: a quick appetiser

The programmer includes assembly language fragments in the high-level source program. For example:


movl    $42, %eax

The preprocessor converts these statements into calls on macros that are implemented in the runtime assembler for the target platform (which is not necessarily the same as the host platform). After preprocessing, the above statement is transformed into:


MOVLir  (42,_EAX);

The runtime assembler provides the definitions of the macro MOVLir() and the constant _EAX (representing the register ``%eax''). Since both arguments to the MOVLir() macro are constants, the compiler can fold them away entirely and produce ``code generation code'' for the above statement that will be something similar to this:


*(char *)(someAddress)     = 0xB8;      /* insn: move imm32 -> %eax */
*(long *)(someAddress + 1) = 0x2A;      /* constant: 42 */

This is the code that is actually compiled into the client program. The final stage of assembly (the placing of binary code into memory) happens at runtime, at the moment when the above statements are executed.

(The insatiably curious can run ccg without command-line arguments on the programs in the ccg/examples directory to see how the preprocessor modifies the source code. The runtime assemblers are in the ccg/asm-arch.h files and should be easily comprehensible. Nevertheless, begin by looking at the PowerPC and Sparc assemblers files before making any attempt to understand the assembler for the Pentium!)

1.3 Raison d'être

ccg is obviously not platform-independent. A given program file can generate code that runs on one architecture only, and includes assembly language statements for that architecture. Portability of dynamic code generation is not the goal of ccg, although the support for portability across compilers and operating systems for a given processor architecture is one of the major goals.

ccg was developed to support the very lowest (platform-dependent) layer in a portable, optimising, dynamic code generator. This system performs the majority of the code generation and optimisation work using a platform-independent ``intermediate'' representation, and then calls a pluggable ``back end'' (implementing a platform-independent interface) to convert the intermediate form into machine instructions.

We have found that porting the back-end to a new platform is considerably easier with a tool such as ccg, and requires only a day or two of work.

1. Introduction

1.1 Overview

1.2 How ccg works: a quick appetiser

1.3 Raison d'être

1.2 How `ccg` works: a quick appetiser