Arrays
Before we get to the basics of compilation, we need another data structure: the humble array. The first use for arrays will be to store the bytecode sequences that the compiler generates.
Rust already provides Vec
but as we're implementing everything in terms of our
memory management abstraction, we cannot directly use Vec
. Rust does not
(yet) expose the ability to specify a custom allocator type as part of Vec
,
nor are we interested in replacing the global allocator.
Our only option is to write our own version of Vec
! Fortunately we can
learn a lot from Vec
itself and it's underlying implementation. Jump over to
the Rustonomicon for a primer on the internals of Vec
.
The first thing we'll learn is to split the implementation into a RawArray<T>
type and an Array<T>
type. RawArray<T>
will provide an unsafe abstraction
while Array<T>
will make a safe layer over it.
RawArray
If you've just come back from Implementing Vec in the Nomicon, you'll
recognize what we're doing below with RawArray<T>
:
pub struct RawArray<T: Sized> {
/// Count of T-sized objects that can fit in the array
capacity: ArraySize,
ptr: Option<NonNull<T>>,
}
Instead of Unique<T>
for the pointer, we're using Option<NonNull<T>>
.
One simple reason is that Unique<T>
is likely to be permanently unstable and
only available internally to std
collections. The other is that we can
avoid allocating the backing store if no capacity is requested yet, setting
the value of ptr
to None
.
For when we do know the desired capacity, there is
RawArray<T>::with_capacity()
. This method, because it allocates, requires
access to the MutatorView
instance. If you'll recall from the chapter on
the allocation API, the API provides an array allocation method with
signature:
AllocRaw::alloc_array(&self, size_bytes: ArraySize) -> Result<RawPtr<u8>, AllocError>;
This method is wrapped on the interpreter side by Heap
and MutatorView
and
in both cases the return value remains, simply, RawPtr<u8>
in the success
case. It's up to RawArray<T>
to receive the RawPtr<u8>
value and maintain
it safely. Here's with_capcity()
, now:
pub fn with_capacity<'scope>(
mem: &'scope MutatorView,
capacity: u32,
) -> Result<RawArray<T>, RuntimeError> {
// convert to bytes, checking for possible overflow of ArraySize limit
let capacity_bytes = capacity
.checked_mul(size_of::<T>() as ArraySize)
.ok_or(RuntimeError::new(ErrorKind::BadAllocationRequest))?;
Ok(RawArray {
capacity,
ptr: NonNull::new(mem.alloc_array(capacity_bytes)?.as_ptr() as *mut T),
})
}
Resizing
If a RawArray<T>
's content will exceed it's capacity, there is
RawArray<T>::resize()
. It allocates a new backing array using the
MutatorView
method alloc_array()
and copies the content of the old
over to the new, finally swapping in the new backing array for the old.
The code for this is straightforward but a little longer, go check it out
in interpreter/src/rawarray.rs
.
Accessing
Since RawArray<T>
will be wrapped by Array<T>
, we need a couple more
methods to access the raw memory:
impl<T: Sized> RawArray<T> {
pub fn capacity(&self) -> ArraySize {
self.capacity
}
pub fn as_ptr(&self) -> Option<*const T> {
match self.ptr {
Some(ptr) => Some(ptr.as_ptr()),
None => None,
}
}
}
And that's it! Now for the safe wrapper.
Array
The definition of the struct wrapping RawArray<T>
is as follows:
#[derive(Clone)]
pub struct Array<T: Sized + Clone> {
length: Cell<ArraySize>,
data: Cell<RawArray<T>>,
borrow: Cell<BorrowFlag>,
}
Here we have three members:
length
- the length of the arraydata
- theRawArray<T>
being wrappedborrow
- a flag serving as a runtime borrow check, allowingRefCell
runtime semantics, since we're in a world of interior mutability patterns
We have a method to create a new array - Array::alloc()
impl<T: Sized + Clone> Array<T> {
pub fn alloc<'guard>(
mem: &'guard MutatorView,
) -> Result<ScopedPtr<'guard, Array<T>>, RuntimeError>
where
Array<T>: AllocObject<TypeList>,
{
mem.alloc(Array::new())
}
}
In fact we'll extend this pattern of a method named "alloc" to any data structure for convenience sake.
There are many more methods for Array<T>
and it would be exhausting to be
exhaustive. Let's go over the core methods used to read and write elements
and then an example use case.
Reading and writing
First of all, we need a function that takes an array index and returns a pointer to a memory location, if the index is within bounds:
impl<T: Sized + Clone> Array<T> {
fn get_offset(&self, index: ArraySize) -> Result<*mut T, RuntimeError> {
if index >= self.length.get() {
Err(RuntimeError::new(ErrorKind::BoundsError))
} else {
let ptr = self
.data
.get()
.as_ptr()
.ok_or_else(|| RuntimeError::new(ErrorKind::BoundsError))?;
let dest_ptr = unsafe { ptr.offset(index as isize) as *mut T };
Ok(dest_ptr)
}
}
}
There are two bounds checks here - firstly, the index should be within the
(likely non-zero) length values; secondly, the RawArray<T>
instance
should have a backing array allocated. If either of these checks fail, the
result is an error. If these checks pass, we can be confident that there
is array backing memory and that we can return a valid pointer to somewhere
inside that memory block.
For reading a value in an array, we need two methods:
- one that handles move/copy semantics and returns a value
- one that handles reference semantics and returns a reference to the original value in it's location in the backing memory
First, then:
impl<T: Sized + Clone> Array<T> {
fn read<'guard>(
&self,
_guard: &'guard dyn MutatorScope,
index: ArraySize,
) -> Result<T, RuntimeError> {
unsafe {
let dest = self.get_offset(index)?;
Ok(read(dest))
}
}
}
and secondly:
impl<T: Sized + Clone> Array<T> {
pub fn read_ref<'guard>(
&self,
_guard: &'guard dyn MutatorScope,
index: ArraySize,
) -> Result<&T, RuntimeError> {
unsafe {
let dest = self.get_offset(index)?;
Ok(&*dest as &T)
}
}
}
Writing, or copying, an object to an array is implemented as simply as follows:
impl<T: Sized + Clone> Array<T> {
pub fn read_ref<'guard>(
&self,
_guard: &'guard dyn MutatorScope,
index: ArraySize,
) -> Result<&T, RuntimeError> {
unsafe {
let dest = self.get_offset(index)?;
Ok(&*dest as &T)
}
}
}
These simple functions should only be used internally by Array<T>
impl
methods. We have numerous methods that wrap the above in more appropriate
semantics for values of T
in Array<T>
.
The Array interfaces
To define the interfaces to the Array, and other collection types, we define a
number of traits. For example, a collection that behaves as a stack implements
StackContainer
; a numerically indexable type implements IndexedContainer
,
and so on. As we'll see, there is some nuance, though, when it comes to a
difference between collections of non-pointer types and collections of pointer
types.
For our example, we will describe the stack interfaces of Array<T>
.
First, the general case trait, with methods for accessing values stored in the array (non-pointer types):
pub trait StackContainer<T: Sized + Clone>: Container<T> {
/// Push can trigger an underlying array resize, hence it requires the ability to allocate
fn push<'guard>(&self, mem: &'guard MutatorView, item: T) -> Result<(), RuntimeError>;
/// Pop returns a bounds error if the container is empty, otherwise moves the last item of the
/// array out to the caller.
fn pop<'guard>(&self, _guard: &'guard dyn MutatorScope) -> Result<T, RuntimeError>;
/// Return the value at the top of the stack without removing it
fn top<'guard>(&self, _guard: &'guard dyn MutatorScope) -> Result<T, RuntimeError>;
}
These are unremarkable functions, by now we're familiar with the references to
MutatorScope
and MutatorView
in method parameter lists.
In any instance of Array<T>
, T
need only implement Clone
and cannot be
dynamically sized. Thus T
can be any primitive type or any straightforward
struct.
What if we want to store pointers to other objects? For example, if we want a
heterogenous array, such as Python's List
type, what would we provide in
place of T
? The answer is to use the TaggedCellPtr
type. However,
an Array<TaggedCellPtr
, because we want to interface with pointers and
use the memory access abstractions provided, can be made a little more
ergonomic. For that reason, we have separate traits for containers of type
Container<TaggedCellPtr
. In the case of the stack interface this looks like:
pub trait StackAnyContainer: StackContainer<TaggedCellPtr> {
/// Push can trigger an underlying array resize, hence it requires the ability to allocate
fn push<'guard>(
&self,
mem: &'guard MutatorView,
item: TaggedScopedPtr<'guard>,
) -> Result<(), RuntimeError>;
/// Pop returns a bounds error if the container is empty, otherwise moves the last item of the
/// array out to the caller.
fn pop<'guard>(
&self,
_guard: &'guard dyn MutatorScope,
) -> Result<TaggedScopedPtr<'guard>, RuntimeError>;
/// Return the value at the top of the stack without removing it
fn top<'guard>(
&self,
_guard: &'guard dyn MutatorScope,
) -> Result<TaggedScopedPtr<'guard>, RuntimeError>;
}
As you can see, these methods, while for T = TaggedCellPtr
, provide an
interface based on passing and returning TaggedScopedPtr
.
Let's look at the implementation of one of these methods - push()
- for
both StackContainer
and StackAnyContainer
.
Here's the code for StackContainer::push()
:
impl<T: Sized + Clone> StackContainer<T> for Array<T> {
fn push<'guard>(&self, mem: &'guard MutatorView, item: T) -> Result<(), RuntimeError> {
if self.borrow.get() != INTERIOR_ONLY {
return Err(RuntimeError::new(ErrorKind::MutableBorrowError));
}
let length = self.length.get();
let mut array = self.data.get(); // Takes a copy
let capacity = array.capacity();
if length == capacity {
if capacity == 0 {
array.resize(mem, DEFAULT_ARRAY_SIZE)?;
} else {
array.resize(mem, default_array_growth(capacity)?)?;
}
// Replace the struct's copy with the resized RawArray object
self.data.set(array);
}
self.length.set(length + 1);
self.write(mem, length, item)?;
Ok(())
}
}
In summary, the order of operations is:
- Check that a runtime borrow isn't in progress. If it is, return an error.
- Since we must implement interior mutability, the member
data
of theArray<T>
struct is aCell
. We have toget()
the content in order to use it. - We then ask whether the array backing store needs to be grown. If so,
we resize the
RawArray<T>
and, since it's kept in aCell
onArray<T>
, we have toset()
value back intodata
to save the change. - Now we have an
RawArray<T>
that has enough capacity, the length is incremented and the object to be pushed is written to the next memory location using the internalArray<T>::write()
method detailed earlier.
Fortunately we can implement StackAnyContainer::push()
in terms of
StackContainer::push()
:
impl StackAnyContainer for Array<TaggedCellPtr> {
fn push<'guard>(
&self,
mem: &'guard MutatorView,
item: TaggedScopedPtr<'guard>,
) -> Result<(), RuntimeError> {
StackContainer::<TaggedCellPtr>::push(self, mem, TaggedCellPtr::new_with(item))
}
}
One last thing
To more easily differentiate arrays of type Array<T>
from arrays of type
Array<TaggedCellPtr>
, we make a type alias List
where:
pub type List = Array<TaggedCellPtr>;
In conclusion
We referenced how Vec
is implemented internally and followed the same pattern
of defining a RawArray<T>
unsafe layer with a safe Array<T>
wrapper. Then
we looked into the stack interface for Array<T>
and the implementation of
push()
.
There is more to arrays, of course - indexed access the most obvious, and also
a few convenience methods. See the source code in interpreter/src/array.rs
for the full detail.
In the next chapter we'll put Array<T>
to use in a Bytecode
type!