README ¶
Graphics API Replay (GAPIR)
GAPIR is a stack-based virtual machine that can execute programs formed from a very small instruction set.
Evaluation of existing VMs
Before embarking on building a new virtual machine from scratch, we evaluated our needs, and compared it to a number of existing, lightweight, open-source VMs (Lua, Parrot, Neko, etc).
We opted for building a custom VM because:
- Our required instruction set was significantly smaller than those provided by other VMs. We have no need for functions or any type of control flow, and by reducing the instruction set to only what we absolutely require, we’ve avoided unnecessary complexity in testing, and generation of the command stream.
- We have no need for standard libraries (math functions, io functions, etc), which for some VMs come bundled with, and can be tricky to separate.
- We desired a very custom memory system that would have been difficult to fit into other VMs.
- Some of the VMs of interest had licences that were incompatible with our needs.
- Our speed requirements are very high, we do profiling based on the VM playback, we need as little overhead as possible per draw call.
Memory pools
GAPIR has 3 distinct types of memory pools.
Volatile memory
Volatile memory is pre-allocated memory that is free to be modified by any opcode during execution. It can be used for temporary or semi-persistent storage.
Constant memory
Along with a sequence of opcodes, a replay request contains a block of constant data. This may be read from at any point in the execution of the replay, but is immutable for the entire replay.
Absolute pointers
Memory that’s not allocated by the replay system may still need to be read or
written to in order to perform a replay. Pointers returned by [glGetString
]
glGetString or glMapBufferRange
are examples
of memory that’s not allocated by the replay system, but may need to be
accessed.
Data types
The GAPID virtual-machine supports the following primitive data types:
Type | Description |
---|---|
Bool | true / false value |
Int8 | 8-bit signed integer |
Int16 | 16-bit signed integer |
Int32 | 32-bit signed integer |
Int64 | 64-bit signed integer |
Uint8 | 8-bit unsigned integer |
Uint16 | 16-bit unsigned integer |
Uint32 | 32-bit unsigned integer |
Uint64 | 64-bit unsigned integer |
Float | 32-bit floating point number |
Double | 64-bit floating point number |
AbsolutePointer | Pointer to an absolute address |
ConstantPointer | Pointer within the constant pool |
VolatilePointer | Pointer within the volatile pool |
Stack
The VM uses a standard LIFO stack where each element is a type-value pair. The size of the stored elements are unified to the size of the largest storable type and all of the elements are aligned.
Each operation, except for CLONE
, consumes the operands from the current stack
and pushes the result back to the stack.
Opcodes
Each opcode is 32 bits long where the first 6 bits are the instruction code and the rest of the bits contain the instruction data. This leaves room for additional instructions to be added in the future.
Notation: <field_name:field_size_in_bits>
CALL(push-return, api, function)
[-{arg-count} (any type) / +{push-return} (any type)]
<code:6> <padding:1> <push-return:1> <padding:4> <api:4> <function id:16>
Calls the specified function in the given API and if push-return is 1 then saves the return value to the stack; otherwise the return value is discarded.
The arguments are popped from the stack and they are type-checked with the arguments of the called function.
The arguments have to be pushed onto the stack in order (the last argument is on the top of the stack).
Function IDs in range 0xff00-0xffff are reserved.
PUSH_I(type, data)
[+1 (type)]
<code:6> <type:6> <data:20>
Pushes data
to the top of the stack.
If the data type is an integer or a pointer type, then the data is copied into the least-significant-bits of the target word, sign-extending if the type is signed.
If the data type is a float or double, then the value is written to the sign and exponent bits of the floating point number, and the fractional bits are set to 0.
LOAD_C(type, address)
[+1 (type)]
<code:6> <type:6> <constant-address:20>
Pushes data loaded from constant-address
to the top of the stack.
LOAD_V(type, address)
[+1 (type)]
<code:6> <type:6> <volatile-address:20>
Pushes data loaded from volatile-address
to the top of the stack.
LOAD(type)
[-1 (pointer) / +1 (type)]
<code:6> <type:6> <padding:20>
Pops a memory address from the top of the stack and pushes the data at that address to the top of the stack
POP(count)
[-{count} (any type)]
<code:6> <count:26>
Pops and discards count
values from the top of the stack.
STORE_V(volatile-address)
[-1 (any type)]
<code:6> <volatile-address:26>
Pops the top value from the the stack and saves it to volatile-address
.
All pointer values, regardless of the pointer type on the stack, will be stored as an
absolute pointer address.
STORE()
[-2 (pointer, any type)]
<code:6> <padding:26>
Pops the target address and then the value from the top of the stack, and then stores the value to the target address. All pointer values, regardless of the pointer type on the stack, will be stored as an absolute pointer address.
RESOURCE(resource-id)
[-1 (pointer)]
<code:6> <resource-id:26>
Pops the address from the top of the stack and then loads the resource resource-id
to that address.
POST()
[-2 (uint32_t, pointer)]
<code:6> <padding:26>
Pops size and then a pointer from the top of the stack and posts size bytes of data from the address to the server.
COPY(count)
[-2 (pointer, pointer)]
<code:6> <count:26>
Pops the target address then the source address from the top of the stack, and
then copies count
bytes from source to target.
CLONE(n)
[+1 (any type)]
<code:6> <n:26>
Copies the n-th element from the top of the stack to the new top of the stack.
STRCPY()
[-2 (pointer, pointer)]
<code:6> <max-count:26>
Pops the target address then the source address from the top of the stack, and
then copies at most max-count
minus one bytes from source to target. If the
max-count
is greater than the source string length, then the target will be
padded with 0s. The destination buffer will always be 0-terminated.
EXTEND(value)
[no change]
<code:6> <value:26>
Extends the value at the top of the stack with the given data, in-place.
If the data type of the top of the stack is an integer or a pointer type, then the value on the stack is left-shifted by 26 bits and is bitwise-OR’ed with the specified value.
If the data type is a float or double, then the fractional part of the floating point value on the stack is left-shifted by 26 bits and is bitwise-OR’ed with the specified value. Bits shifted beyond the fractional part of the floating point number are discarded.
ADD(value)
[no change]
<code:6> <count:26>
Pops and sums count
values from the top of the stack, and then pushes the
result to the top of the stack.
All summed value types must be equal.
LABEL(value)
[no change]
<code:6> <value:26>
Set the current debug label to value
.
The label value is displayed in debug messages or in the case of a crash.
Resources
GAPIR is designed to be run on desktop and Android devices. When replaying on Android, the communication between GAPIS and GAPIR is usually performed over USB 2, which has a peak throughput of around 60 megabytes per second. It’s not uncommon for capture files to be hundreds of megabytes in size, and in rare cases an order of magnitude greater than that.
It is typical for many replay requests to be made for the same capture file - for example clicking around the draw calls in the client will usually result in a replay request per click. The bulk of the data in replay requests of the same capture file is identical - the large assets are typically static textures and mesh data.
To avoid repeated transmission of these large assets over USB, GAPIR has a memory cache for storing resource data.
A list of resources used in the replay is included as part of the replay request payload header. This list consists of all the resource identifiers used by the replay stream (and their size). Upon receiving the header, GAPIR can check which of the resources it already has in its cache, and request the resource data for those that are missing.