Simple Virtual Machine

In this post I want to present, how to create a simple virtual machine with it’s own bytecode interpreter. While it’s created using C language, it doesn’t use any complex constructs and could be possibly implemented the same with any imperative language.

NOTE: while in this example bytecode instruction set is limited to the simple integer operations, real life virtual machines usually have a lot more complex operands (eg. object creation instruction in VMs with managed memory or dynamic method invocations). Have in mind that even if your Intermediate Representation may look like some BASIC/ASM language, it really could abstract much higher level functionalities.

Virtual Machine

Lets start with precising, what our virtual machine should consist of. In this example, we’ll create a stack-based virtual machine. What does it mean? There is no notion of registers and all operation are performed through virtual stack. This results in simpler (since our instruction set doesn’t have to include virtual registers operations), but also longer bytecode – because of more operations have to be executed than in alternative option, which may support a virtual register calls.

In this example, our virtual machine will consist of:

Executable code pointer – in order to execute the program, a VM has to now what program it is ;) Our bytecode contains opcodes (which are integers used to define instructions to be executed by an interpreter) and numbers used for things such as constant values, memory addresses or stack offsets.
Stack – virtual stack will work as fragment of memory used for storing intermediate results and additional data of each instruction. When a procedure call will be invoked, it will also save a state of the program at moment, when procedure has been called, to retrieve it back after program will return from that call.
Local data – unlike the stack, which uses LIFO (Last-In First-Out) operations, local data may be used as random access memory. It also may change it’s size according to program needs. In this example we may use it to store both global and local variables.
Program counter – it contains an address of currently executed instruction. Program will be executed by moving PC through instruction set, reading opcodes and executing them.
Stack pointer – it contains information about number of elements stored on the stack. It always points on the top of it.
Frame pointer – since we need to differentiate a locally and globally scoped values (this includes function arguments), we need to know where our scope is located on the stack. All local variables are assigned relatively to FP position.

// stack will have fixed size
#define STACK_SIZE 100

typedef struct {
    int* locals;    // local scoped data
    int* code;      // array od byte codes to be executed
    int* stack;     // virtual stack
    int pc;         // program counter (aka. IP - instruction pointer)
    int sp;         // stack pointer
    int fp;         // frame pointer (for local scope)
} VM;

VM* newVM(int* code,    // pointer to table containing a bytecode to be executed
    int pc,             // address of instruction to be invoked as first one - entrypoint/main func
    int datasize) {      // total locals size required to perform a program operations
    VM* vm = (VM*)malloc(sizeof(VM));
    vm->code = code;
    vm->pc = pc;
    vm->fp = 0;
    vm->sp = -1;
    vm->locals = (int*)malloc(sizeof(int) * datasize);
    vm->stack = (int*)malloc(sizeof(int) * STACK_SIZE);

    return vm;
}

    void delVM(VM* vm){
        free(vm->locals);
        free(vm->stack);
        free(vm);
    }

Bytecode instruction set

At the begginig we’ll start from defining a set of instructions to be handled by our language. For sake of simplicity we’ll focus on basic integer operations, printing their result, variable creation and procedure-oriented calls.

enum {
    ADD_I32 = 1,    // int add
    SUB_I32 = 2,    // int sub
    MUL_I32 = 3,    // int mul
    LT_I32 = 4,     // int less than
    EQ_I32 = 5,     // int equal
    JMP = 6,        // branch
    JMPT = 7,       // branch if true
    JMPF = 8,       // branch if false
    CONST_I32 = 9,  // push constant integer
    LOAD = 10,      // load from local
    GLOAD = 11,     // load from global
    STORE = 12,     // store in local
    GSTORE = 13,    // store in global memory
    PRINT = 14,     // print value on top of the stack
    POP = 15,       // throw away top of the stack
    HALT = 16,      // stop program
    CALL = 17,      // call procedure
    RET = 18        // return from procedure
};

Just like in standard assembly code, each instruction is represented by specific byte number. In more realistic virtual machine those numbers may have specific values (for example to perform bitwise masking).

Main interpreter loop

It’s time to create our code execution loop. How does it work? It’s a simple loop with switch statement used to interpret particular operation codes found in our program. While this is not the fastest solution, it gives us a simple understanding of how fetch/decode/execute cycle works.

#define PUSH(vm, v) vm->stack[++vm->sp] = v // push value on top of the stack
#define POP(vm)     vm->stack[vm->sp--]     // pop value from top of the stack
#define NCODE(vm)   vm->code[vm->pc++]      // get next bytecode

void run(VM* vm){
    do{
        int opcode = NCODE(vm);        // fetch
        int v, addr, offset, a, b, argc, rval;

        switch (opcode) {   // decode
        case HALT: return;  // stop the program
        case CONST_I32: ...
        case ADD_I32: ...
        case SUB_I32: ...
        case MUL_I32: ...
        case LT_I32: ...
        case EQ_I32: ...
        case JMP: ...
        case JMPT: ...
        case JMPF: ...
        case LOAD:  ...
        case GLOAD: ...
        case GSTORE: ...
        case CALL: ...
        case RET: ...
        case POP:
            --vm->sp;      // throw away value at top of the stack
            break;
        case PRINT:
            v = POP(vm);        // pop value from top of the stack ...
            printf("%d\n", v);  // ... and print it
            break;
        default:
            break;
        }

    }while(1);
}

Here we’ve already defined HALT, POP and PRINT opcodes. In more realistic example, you probably don’t want to define value display in form of dedicated instruction, but this will allow us to simply see result of our operations.

Integer operators

Now we can define some integer operations. While we’re defined a plenty of them, here I’ll show only one example for each group: constant initialization, arithmetic and logic operations.

What is worth emphasizing, is that all value swaps are performed through stack (there are no registers), so each instruction takes it’s operands from top of the stack and also puts a computed result on top of it. Also, since stacks are LIFO-type collections, all values taken from the stack are in reversed order – for example look at ADD_I32 instruction below.

case CONST_I32:
    v = NCODE(vm);   // get next value from code ...
    PUSH(vm, v);     // ... and move it on top of the stack
    break;
case ADD_I32:
    b = POP(vm);        // get second value from top of the stack ...
    a = POP(vm);        // ... then get first value from top of the stack ...
    PUSH(vm, a + b);    // ... add those two values and put result on top of the stack
    break;
case LT_I32:
    b = POP(vm);        // get second value from top of the stack ...
    a = POP(vm);        // ... then get first value from top of the stack ...
    PUSH(vm, (a<b) ? 1 : 0); // ... compare those two values, and put result on top of the stack
    break;

Instructions such as SUB_I32, MUL_I32 and EQ_I32 won’t be defined in scope of this post, but I think when you’ll understand example above, they should be easy to implement.

Jump instructions

Next, we have to define jump instructions in our code. They are basically equivalent of goto keyword and may be used to implement statements such as loops and if/else conditions.

case JMP:
    vm->pc = NCODE(vm);  // unconditionaly jump with program counter to provided address
    break;
case JMPT:
    addr = NCODE(vm);  // get address pointer from code ...
    if(POP(vm)) {      // ... pop value from top of the stack, and if it's true ...
        vm->pc = addr; // ... jump with program counter to provided address
    }
    break;

Again, JUMPF (conditional jump if false) implementation should be analogous to JMPT.

Move memory globally

Now, when all basic constructs are defined, we shall provide a way to define a variables in our program. This paragraph will cover problem of globally-scoped ones, while the next one will show how to implement locally-scoped variables.

Unlike stack, a local data is a random access memory of elastic size, so all in/out operations performed on it using memory addresses (aka. pointers). Therefore in our example all load/store opcodes should be followed by number being an address pointer to the local data.

case GLOAD:
    addr = POP(vm);             // get pointer address from code ...
    v = vm->locals[addr];         // ... load value from memory of the provided addres ...
    PUSH(vm, v);                // ... and put that value on top of the stack
    break;
case GSTORE:
    v = POP(vm);                // get value from top of the stack ...
    addr = NCODE(vm);           // ... get pointer address from code ...
    vm->locals[addr] = v;         // ... and store value at address received
    break;

Manipulate local scope memory

This is similar to previously defined GLOAD and GSTORE instruction. However there is a one significant difference – instead of taking an locals address directly from provided value, we compute it relatively to current frame pointer.

case LOAD:                  // load local value or function arg
    offset = NCODE(vm);     // get next value from code to identify local variables offset start on the stack
    PUSH(vm, vm->stack[vm->fp+offset]); // ... put on the top of the stack variable stored relatively to frame pointer
    break;
case STORE:                 // store local value or function arg
    v = POP(vm);            // get value from top of the stack ...
    offset = NCODE(vm);     // ... get the relative pointer address from code ...
    vm->locals[vm->fp+offset] = v;  // ... and store value at address received relatively to frame pointer
    break;

Procedure calls

All of previous instructions were fairly simple to execute. Now it’s time to do something slightly more complex – function/procedure calls. Before we begin, lets describe how they works. To perform the call, we have to first save the current state of the program at the moment before the call will be performed – it will let us freely alter program state in scope of the function. To do so, we have to save three values:

Address of the caller instruction – this is a current value of program counter and it’s need to be stored, so that VM will know, where program should return from finished procedure call.
Frame pointer – this is necessary, since we want to be able to use locally scoped variables when we enter into new procedure (tl;dr – it gives us a function scope feature).
Number of function/procedure arguments – before call will be performed, all necessary arguments will be put on top of the stack. However returning from function should dispose all of the provided arguments, so that we could retrieve back state of the stack from before the function call. If we want to know how many arguments from the stack should be thrown away, firstly we have to store that count – it will be also put on top of the stack.

RET instruction works in reverse order – it takes value from top of the stack as function return value, rollbacks frame pointers, moves program counter to the caller instruction, disposes all of function call arguments from the stack and replaces all of these with return value.

Because we’ve stored all of those values on top of the stack, we’re able to store next consecutive calls in the same manner. It means that, we’re able to nest our procedure calls. However, since each call stores some portion of data on the stack, at some point we may run out of memory, which cause stack overflow error.

case CALL:
    // we expect all args to be on the stack
    addr = NCODE(vm); // get next instruction as an address of procedure jump ...
    argc = NCODE(vm); // ... and next one as number of arguments to load ...
    PUSH(vm, argc);   // ... save num args ...
    PUSH(vm, vm->fp); // ... save function pointer ...
    PUSH(vm, vm->pc); // ... save instruction pointer ...
    vm->fp = vm->sp;  // ... set new frame pointer ...
    vm->pc = addr;    // ... move instruction pointer to target procedure address
    break;
case RET:
    rval = POP(vm);     // pop return value from top of the stack
    vm->sp = vm->fp;    // ... return from procedure address ...
    vm->pc = POP(vm);   // ... restore instruction pointer ...
    vm->fp = POP(vm);   // ... restore framepointer ...
    argc = POP(vm);     // ... hom many args procedure has ...
    vm->sp -= argc;     // ... discard all of the args left ...
    PUSH(vm, rval);     // ... leave return value on top of the stack
    break;

Example code

Now, when our bytecode interpreter is finally complete, we may test it on some example program. Please note numbers in comments on the right – they show actual opcode instruction address, and will be used for conditional jumps and procedure calls.

const int fib = 0;  // address of the fibonacci procedure
int program[] = {
    // int fib(n) {
    //     if(n == 0) return 0;
    LOAD, -3,       // 0 - load last function argument N
    CONST_I32, 0,   // 2 - put 0
    EQ_I32,         // 4 - check equality: N == 0
    JMPF, 10,       // 5 - if they are NOT equal, goto 10
    CONST_I32, 0,   // 7 - otherwise put 0
    RET,            // 9 - and return it
    //     if(n < 3) return 1;
    LOAD, -3,       // 10 - load last function argument N
    CONST_I32, 3,   // 12 - put 3
    LT_I32,         // 14 - check if 3 is less than N
    JMPF, 20,       // 15 - if 3 is NOT less than N, goto 20
    CONST_I32, 1,   // 17 - otherwise put 1
    RET,            // 19 - and return it
    //     else return fib(n-1) + fib(n-2);
    LOAD, -3,       // 20 - load last function argument N
    CONST_I32, 1,   // 22 - put 1
    SUB_I32,        // 24 - calculate: N-1, result is on the stack
    CALL, fib, 1,   // 25 - call fib function with 1 arg. from the stack
    LOAD, -3,       // 28 - load N again
    CONST_I32, 2,   // 30 - put 2
    SUB_I32,        // 32 - calculate: N-2, result is on the stack
    CALL, fib, 1,   // 33 - call fib function with 1 arg. from the stack
    ADD_I32,        // 36 - since 2 fibs pushed their ret values on the stack, just add them
    RET,            // 37 - return from procedure
    // entrypoint - main function
    CONST_I32, 6,   // 38 - put 6 
    CALL, fib, 1,   // 40 - call function: fib(arg) where arg = 6;
    PRINT,          // 43 - print result
    HALT            // 44 - stop program
};

// initialize virtual machine
VM* vm = newVM(program,   // program to execute
                   38,    // start address of main function
                   0);    // locals to be reserved, fib doesn't require them
run(vm);

As you may have seen, there is a lot of LOAD, -3 operations. Why -3? Again, when we perform a procedure call, we need to store current program state on top of the stack. To do so, we’ll put here a 3 values: program counter (as return address), frame pointers before the call, and number of arguments to be used by called procedure. Therefore everything below them (-3 and less) will be an actual function arguments saved on the stack. You could optimize it by adding something like LDARG, x opcode.

Why?

There are numerous reasons, why developers may want to use intermediate bytecode representations for their languages:

Performance – as you’ve seen, execution of the bytecode is realized mostly by just moving (mostly incrementing) PC pointer. When interpreter works directly by evaluating and traversing an AST tree, there is a lot of pointer jumping and structure blocks, which is slightly more complex and slower. Bytecode allows you to faster code interpretation as well as it may facilitate some optimizations.
Code size – full code operates on human readable strings, which may consume some memory and are harder to interpret. Intermediate program representation uses only a simple integer arrays.
Well-fitting – since you may describe your own IR, you may fit it directly to your needs. It’s easier to parse, and it could remove some repetitive work, when you’ll want to implement your language on the various OSes.