Defining the allocation API

Let's look back at the allocator prototype API we defined in the introductory chapter.

trait AllocRaw {
    fn alloc<T>(&self, object: T) -> *const T;
}

This will quickly prove to be inadequate and non-idiomatic. For starters, there is no way to report that allocation failed except for perhaps returning a null pointer. That is certainly a workable solution but is not going to feel idiomatic or ergonomic for how we want to use the API. Let's make a couple changes:

trait AllocRaw {
    fn alloc<T>(&self, object: T) -> Result<RawPtr<T>, AllocError>;
}

Now we're returning a Result, the failure side of which is an error type where we can distinguish between different allocation failure modes. This is often not that useful but working with Result is far more idiomatic Rust than checking a pointer for being null. We'll allow for distinguishing between Out Of Memory and an allocation request that for whatever reason is invalid.

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum AllocError {
    /// Some attribute of the allocation, most likely the size requested,
    /// could not be fulfilled
    BadRequest,
    /// Out of memory - allocating the space failed
    OOM,
}

The second change is that instead of a *const T value in the success discriminant we'll wrap a pointer in a new struct: RawPtr<T>. This wrapper will amount to little more than containing a std::ptr::NonNull instance and some functions to access the pointer.

pub struct RawPtr<T: Sized> {
    ptr: NonNull<T>,
}

This'll be better to work with on the user-of-the-crate side.

It'll also make it easier to modify internals or even swap out entire implementations. This is a motivating factor for the design of this interface as we'll see as we continue to amend it to account for object headers now.

Object headers

The purpose of an object header is to provide the allocator, the language runtime and the garbage collector with information about the object that is needed at runtime. Typical data points that are stored might include:

  • object size
  • some kind of type identifier
  • garbage collection information such as a mark flag

We want to create a flexible interface to a language while also ensuring that the interpreter will provide the information that the allocator and garbage collector in this crate need.

We'll define a trait for the user to implement.

pub trait AllocHeader: Sized {
    /// Associated type that identifies the allocated object type
    type TypeId: AllocTypeId;

    /// Create a new header for object type O
    fn new<O: AllocObject<Self::TypeId>>(size: u32, size_class: SizeClass, mark: Mark) -> Self;

    /// Create a new header for an array type
    fn new_array(size: ArraySize, size_class: SizeClass, mark: Mark) -> Self;

    /// Set the Mark value to "marked"
    fn mark(&mut self);

    /// Get the current Mark value
    fn is_marked(&self) -> bool;

    /// Get the size class of the object
    fn size_class(&self) -> SizeClass;

    /// Get the size of the object in bytes
    fn size(&self) -> u32;

    /// Get the type of the object
    fn type_id(&self) -> Self::TypeId;
}

Now we have a bunch more questions to answer! Some of these trait methods are straightforward - fn size(&self) -> u32 returns the object size; mark() and is_marked() must be GC related. Some are less obvious, such as new_array() which we'll cover at the end of this chapter.

But this struct references some more types that must be defined and explained.

Type identification

What follows is a set of design trade-offs made for the purposes of this book; there are many ways this could be implemented.

The types described next are all about sharing compile-time and runtime object type information between the allocator, the GC and the interpreter.

We ideally want to make it difficult for the user to make mistakes with this and leak undefined behavior. We would also prefer this to be a safe-Rust interface, while at the same time being flexible enough for the user to make interpreter-appropriate decisions about the header design.

First up, an object header implementation must define an associated type

pub trait AllocHeader: Sized {
    type TypeId: AllocTypeId;
}

where AllocTypeId is define simply as:

pub trait AllocTypeId: Copy + Clone {}

This means the interpreter is free to implement a type identifier type however it pleases, the only constraint is that it implements this trait.

Next, the definition of the header constructor,

pub trait AllocHeader: Sized {
    ...

    fn new<O: AllocObject<Self::TypeId>>(
        size: u32,
        size_class: SizeClass,
        mark: Mark
    ) -> Self;

    ...
}

refers to a type O that must implement AllocObject which in turn must refer to the common AllocTypeId. The generic type O is the object for which the header is being instantiated for.

And what is AllocObject? Simply:

pub trait AllocObject<T: AllocTypeId> {
    const TYPE_ID: T;
}

In summary, we have:

  • AllocHeader: a trait that the header type must implement
  • AllocTypeId: a trait that a type identifier must implement
  • AllocObject: a trait that objects that can be allocated must implement

An example

Let's implement a couple of traits to make it more concrete.

The simplest form of type identifier is an enum. Each discriminant describes a type that the interpreter will use at runtime.

#[derive(PartialEq, Copy, Clone)]
enum MyTypeId {
    Number,
    String,
    Array,
}

impl AllocTypeId for MyTypeId {}

A hypothetical numeric type for our interpreter with the type identifier as associated constant:

struct BigNumber {
    value: i64
}

impl AllocObject<MyTypeId> for BigNumber {
    const TYPE_ID: MyTypeId = MyTypeId::Number;
}

And finally, here is a possible object header struct and the implementation of AllocHeader::new():

struct MyHeader {
    size: u32,
    size_class: SizeClass,
    mark: Mark,
    type_id: MyTypeId,
}

impl AllocHeader for MyHeader {
    type TypeId = MyTypeId;

    fn new<O: AllocObject<Self::TypeId>>(
        size: u32,
        size_class: SizeClass,
        mark: Mark
    ) -> Self {
        MyHeader {
            size,
            size_class,
            mark,
            type_id: O::TYPE_ID,
        }
    }

    ...
}

These would all be defined and implemented in the interpreter and are not provided by the Sticky Immix crate, while all the functions in the trait AllocHeader are intended to be called internally by the allocator itself, not on the interpreter side.

The types SizeClass and Mark are provided by this crate and are enums.

The one drawback to this scheme is that it's possible to associate an incorrect type id constant with an object. This would result in objects being misidentified at runtime and accessed incorrectly, likely leading to panics.

Fortunately, this kind of trait implementation boilerplate is ideal for derive macros. Since the language side will be implementing these structs and traits, we'll defer until the relevant interpreter chapter to go over that.

Back to AllocRaw

Now that we have some object and header definitions and constraints, we need to apply them to the AllocRaw API. We can't allocate an object unless it implements AllocObject and has an associated constant that implements AllocTypeId. We also need to expand the interface with functions that the interpreter can use to reliably get the header for an object and the object for a header.

We will add an associated type to tie the allocator API to the header type and indirectly to the type identification that will be used.

pub trait AllocRaw {

    type Header: AllocHeader;

    ...
}

Then we can update the alloc() function definition to constrain the types that can be allocated to only those that implement the appropriate traits.

pub trait AllocRaw {
    ...

    fn alloc<T>(&self, object: T) -> Result<RawPtr<T>, AllocError>
    where
        T: AllocObject<<Self::Header as AllocHeader>::TypeId>;

    ...
}

We need the user and the garbage collector to be able to access the header, so we need a function that will return the header given an object pointer.

The garbage collector does not know about concrete types, it will need to be able to get the header without knowing the object type. It's likely that the interpreter will, at times, also not know the type at runtime.

Indeed, one of the functions of an object header is to, at runtime, given an object pointer, derive the type of the object.

The function signature therefore cannot refer to the type. That is, we can't write

pub trait AllocRaw {
    ...

    // looks good but won't work in all cases
    fn get_header<T>(object: RawPtr<T>) -> NonNull<Self::Header>
    where
        T: AllocObject<<Self::Header as AllocHeader>::TypeId>;

    ...
}

even though it seems this would be good and right. Instead this function will have to be much simpler:

pub trait AllocRaw {
    ...

    fn get_header(object: NonNull<()>) -> NonNull<Self::Header>;

    ...
}

We also need a function to get the object from the header:

pub trait AllocRaw {
    ...

    fn get_object(header: NonNull<Self::Header>) -> NonNull<()>;

    ...
}

These functions are not unsafe but they do return NonNull which implies that dereferencing the result should be considered unsafe - there is no protection against passing in garbage and getting garbage out.

Now we have an object allocation function, traits that constrain what can be allocated, allocation header definitions and functions for switching between an object and it's header.

There's one missing piece: we can allocate objects of type T, but such objects always have compile-time defined size. T is constrained to Sized types in the RawPtr definition. So how do we allocate dynamically sized objects, such as arrays?

Dynamically sized types

Since we can allocate objects of type T, and each T must derive AllocObject and have an associated const of type AllocTypeId, dynamically sized allocations must fit into this type identification scheme.

Allocating dynamically sized types, or in short, arrays, means there's some ambiguity about the type at compile time as far as the allocator is concerned:

  • Are we allocating one object or an array of objects? If we're allocating an array of objects, we'll have to initialize them all. Perhaps we don't want to impose that overhead up front?
  • If the allocator knows how many objects compose an array, do we want to bake fat pointers into the interface to carry that number around?

In the same way, then, that the underlying implementation of std::vec::Vec is backed by an array of u8, we'll do the same. We shall define the return type of an array allocation to be of type RawPtr<u8> and the size requested to be in bytes. We'll leave it to the interpreter to build layers on top of this to handle the above questions.

As the definition of AllocTypeId is up to the interpreter, this crate can't know the type id of an array. Instead, we will require the interpreter to implement a function on the AllocHeader trait:

pub trait AllocHeader: Sized {
    ...

    fn new_array(size: ArraySize, size_class: SizeClass, mark: Mark) -> Self;

    ...
}

This function should return a new object header for an array of u8 with the appropriate type identifier.

We will also add a function to the AllocRaw trait for allocating arrays that returns the RawPtr<u8> type.

pub trait AllocRaw {
    ...

    fn alloc_array(&self, size_bytes: ArraySize) -> Result<RawPtr<u8>, AllocError>;

    ...
}

Our complete AllocRaw trait definition now looks like this:

pub trait AllocRaw {
    /// An implementation of an object header type
    type Header: AllocHeader;

    /// Allocate a single object of type T.
    fn alloc<T>(&self, object: T) -> Result<RawPtr<T>, AllocError>
    where
        T: AllocObject<<Self::Header as AllocHeader>::TypeId>;

    /// Allocating an array allows the client to put anything in the resulting data
    /// block but the type of the memory block will simply be 'Array'. No other
    /// type information will be stored in the object header.
    /// This is just a special case of alloc<T>() for T=u8 but a count > 1 of u8
    /// instances.  The caller is responsible for the content of the array.
    fn alloc_array(&self, size_bytes: ArraySize) -> Result<RawPtr<u8>, AllocError>;

    /// Given a bare pointer to an object, return the expected header address
    fn get_header(object: NonNull<()>) -> NonNull<Self::Header>;

    /// Given a bare pointer to an object's header, return the expected object address
    fn get_object(header: NonNull<Self::Header>) -> NonNull<()>;
}

In the next chapter we'll build out the AllocRaw trait implementation.