Virtual Machine: Architecture and Design

In this short chapter we will outline our virtual machine design choices. These are substantially a matter of pragmatic dynamic language implementation points and as such, borrow heavily from uncomplicated prior work such as Lua 5 and Crafting Interpreters.


We already discussed our Lua-inspired bytecode in a previous chapter. To recap, we are using 32 bit fixed-width opcodes with space for 8 bit register identifiers and 16 bit literals.

The stack

Following the example of Crafting Interpreters we'll maintain two separate stack data structures:

  • the register stack for storing stack values
  • the call frame stack

In our case, these are best separated out because the register stack will be composed entirely of TaggedCellPtrs.

To store call frames on the register stack we would have to either:

  1. allocate every stack frame on the heap with pointers to them from the register stack
  2. or coerce a call frame struct type into the register stack type

Neither of these is attractive so we will maintain the call frame stack as an independent data structure.

The register stack

The register stack is a homogeneous array of TaggedCellPtrs. Thus, no object is allocated directly on the stack, all objects are heap allocated and the stack only consists of pointers to heap objects. The exception is literal integers that fit within the range allowed by a tagged pointer.

Since this is a register virtual machine, not following stack push and pop semantics, and bytecode operands are limited to 8 bit register indexes, a function is limited to addressing a maximum of 256 contiguous registers.

Due to function call nesting, the register stack may naturally grow much more than a length of 256.

This requires us to implement a sliding window into the register stack which will move as functions are called and return. The call frame stack will contain the stack base pointer for each function call. We can then happily make use a Rust slice to implement the window of 256 contiguous stack slots which a function call is limited to.

The call frame stack

A call frame needs to store three critical data points:

  • a pointer to the function being executed
  • the return instruction pointer when a nested function is called
  • the stack base pointer for the function call

These three items can form a simple struct and we can define an Array<CallFrame> type for optimum performance.

Global values

To store global values, we have all we need: the Dict type that maps Symbols to another value. The VM will, of course, have an abstraction over the internal Dict to enforce Symbols only as keys.


In the classic upvalues implementation from Lua 5, followed also by Crafting Interpreters, a linked list of upvalues is used to map stack locations to shared variables.

In every respect but one, our implementation will be similar.

In our implementation, we'll use the Dict type that we already have available to do this mapping of stack locations to shared variables.

As the language and compiler will implement lexical scoping, the compiler will have static knowledge of the relative stack locations of closed-over variables and can generate the appropriate bytecode operands for the virtual machine to calculate the absolute stack locations at runtime. Thus, absolute stack locations can be mapped to Upvalue objects and so a Dict can be employed to facilitate the mapping. This obviates the need to implement a linked list data structure.

The compiler must issue instructions to tell the VM when to make a closure data structure. It can do so, of course, because simple analysis shows whether a function references nonlocal bindings. A closure data structure as generated by the compiler must reference the function that will be called and the list of relative stack locations that correspond to each nonlocal binding.

The VM, when executing the instruction to make a closure, will calculate the absolute stack locations for each nonlocal binding and create the closure environment - a List<Upvalue>. VM instructions within the function code, as in Lua, indirectly reference nonlocal bindings by indexing into this environment.

Partial functions

Here is one point where we will introduce a less common construct in our virtual machine. Functions will be first class, that is they are objects that can be passed around as values and arguments. On top of that, we'll allow passing insufficient arguments to a function when it is called. The return value of such an operation will, instead of an error, be a Partial instance. This value must carry with it the arguments given and a pointer to the function waiting to be called.

This is insufficient for a fully featured currying implementation but is an interesting extension to first class functions, especially as it allows us to not require lambdas to be constructed syntactically every time they might be used.

By that we mean the following: if we have a function (def mul (x y) (* x y)), to turn that into a function that multiplies a number by 3 we'd normally have to define a second function, or lambda, (lambda (x) (mul x 3)) and call it instead. However, with a simple partial function implementation we can avoid the lambda definition and call (mul 3) directly, which will collect the function pointer for mul and argument 3 into a Partial and wait for the final argument before calling into the function mul with both required arguments.

Note: We can use the same struct for both closures and partial functions. A closure is a yet-to-be-called function carrying a list of references to values. or a list of values. A partial is a yet-to-be-called function carrying a list of arguments. They look very similar, and it's possible, of course, to partially apply arguments to a closure.

Instruction dispatch

In dispatch, one optimal outcome is to minimize the machine code overhead between each VM instruction code. This overhead, where the next VM instruction is fetched, decoded and mapped to the entry point of the instruction code, is the dispatch code. The other axis of optimization is code ergonomics.

Prior research into implementing dispatch in Rust concludes that simple switch-style dispatch is the only cross-platform construct we can reasonably make use of. Other mechanisms come with undesirable complexity or are platform dependent. For the most part, with modern CPU branch prediction, the overhead of switch dispatch is small.

What this looks like: a single match expression with a pattern to represent each bytecode discriminant, all wrapped in a loop. To illustrate:

loop {
    let opcode = get_next_opcode();
    match opcode {
        Opcode::Add(a, x, y) => { ... },
        Opcode::Call(f, r, p) => { ... },

That's it!

Next we'll look at the counterpart of VM design - compiler design.