Motivation
In February of 2024, I competed at RowdyHacks during the 9th
annual Hackathon. This was my second time attending, having
placed second the previous year
.
My understanding of programming and of Rust had grown significantly over
that time, and I wanted to one-up myself. I had just finished my Computer
Organization class, where we learned to program in x86 assembly. The class
made me realize that the design of an assembly language has a profound
impact on every higher level language built on-top of it. As a Rust
programmer who cares a great deal about safety and soundness guarantees, it
troubles me that I have to compile to an inherently unsafe assembly
language. I wanted to find out how feasible it would be to write a new
assembly language designed for easy static analysis and safety guarantees.
I am very satisfied with the result, which I named Forte.
Specifications
Instructions
Forte is an assembly language and bytecode for a hypothetical 128-bit
processor. It contains 26 instructions, but no directly exposed registers.
Name |
Mnemonic |
Push |
push |
Pop | pop |
Duplicate |
dup |
Add |
add |
Difference |
diff |
Multiply |
mul |
Divide |
div |
Remainder |
rem |
Bitwise And |
and |
Bitwise Or |
or |
Bitwise Xor |
xor |
Shift Right |
shr |
Shift Left |
shl |
Name (cont.) |
Mnemonic (cont.) |
Branch Equal |
beq |
Branch Unequal |
bne |
Branch Greater |
bgt |
Branch Lesser |
blt |
Function Start |
fun |
Call |
call |
Return |
ret |
Loop |
loop |
Iterate Loop |
iter |
Begin Execution |
exe |
Store |
sto |
Load |
lod |
Stack Length |
len |
The Three Stacks
The stack is an essential part of any assembly language, however it is
often a point of vulnerability. Stack-smashing and stack-overflow
vulnerabilities can very easily allow attackers to overwright function
return addresses, and and execute arbitrary code. This is why Forte uses
three distinct stacks.
The Program Stack
The program stack, or p-stack for short, contains all of the state for the
user-space program. When you push or pop a value in Forte, you are
interacting with the p-stack. What's most interesting about the p-stack is
what it does not contain: pointers. There is no way to jump to an address
on the p-stack, as its contents are considered untrusted.
The Control Stack
In order to call functions, return addresses must be stored somewhere.
While most assembly languages place these on the stack alongside other
values, Forte segregates them into a separate control stack (c-stack for
short.) This is similar to the "shadow stack" option some compilers use,
except implemented at a hardware level. The only way to interact with the
c-stack is through call and return instructions.
The Function Stack
Most assembly languages have a "jump" instruction which moves the program
counter to an arbitrary point in memory. This is useful but not
particularly safe. Forte only allows jumping to valid functions. The
locations of these functions are kept in the function stack (f-stack.)
Functions are added to the f-stack after they are validated during the
warmup phase, which I will discuss next.
Warmup
When a Forte program begins, it does not immediately start executing
instructions. Instead, it validates the programs correctness and builds
the f-stack. Every
fun
(function start) instruction will
push the address of that instruction to the f-stack. Every instruction that
interacts with the stack will increment or decrement an internal register
accordingly. If this value value falls below zero or above the maximum
stack size, then the program is determined to be unsafe and execution
is cancelled before it begins. The warmup phase will take O(N) time,
where N is the number of instructions.
Recital
After the
exe
(begin execution) instruction is reached, the
Recital phase begins. The function pointer returns to the address at the
top of the f-stack, which was the last defined function. When the program
counter reaches the
exe
again, the program has terminated
successfully. This combination of design decisions means that the program
counter can never be lower than the first function address, or larger than
the execution instruction. This makes arbitrary code execution attacks more
difficult to pull off.
Safety
Creating a memory safe assembly language presents much different challenges
from a compiled language like Rust. There are no compile-time checks or
guarantees, any string of bytes could be interpreted as a "program." The
core design principle of Forte is to make unsecure programs impossible (or
at least very difficult) to express, and to make static analysis simple.
This is a delicate balance to strike - limiting Forte's capabilities
necessarily makes it more cumbersome and less performant. I believe we are
at a point in history where making this trade-off is the right move. Modern
processors are unbelievably fast, security concerns far outweigh
performance concerns in most use cases. Writing machine assembly by hand is
far less common today, and so the ergonomics of an assembly language are
also less of a concern.
Limitations
Forte is the first programming language I've written, and it was hacked
together in less than 24 hours. It is certainly not production ready, and
not every feature is implemented yet. Other features are technically functional,
but exist as mostly placeholders for when I can find a better way to implement them.
Evaluating branching and looping code during the Warmup is very difficult to achieve.
I very quickly run into the
Halting Problem,
and feasibility limits where I to try and implement this sort of algorithm on real
hardware. Solutions I've considered are requiring code branches to have the same net effect
on stack size, forcing branches to execute another function rather than exist inline, replacing
loops with some form of recursion, or outright removing branches and loops altogether. Without
solving this problem, it is impossible to make the Warmup phase do its job, and so it is mostly
vestigial. As of now, Forte is only really capable of simple linear programs.
Try it Out