Forte

Programming Language - Hackathon

Motivation

In February of 2024, I competed at RowdyHacks during the 9th annual Hackathon. This was my second time attending, having placed second the previous year . My understanding of programming and of Rust had grown significantly over that time, and I wanted to one-up myself. I had just finished my Computer Organization class, where we learned to program in x86 assembly. The class made me realize that the design of an assembly language has a profound impact on every higher level language built on-top of it. As a Rust programmer who cares a great deal about safety and soundness guarantees, it troubles me that I have to compile to an inherently unsafe assembly language. I wanted to find out how feasible it would be to write a new assembly language designed for easy static analysis and safety guarantees. I am very satisfied with the result, which I named Forte.

Specifications

Instructions

Forte is an assembly language and bytecode for a hypothetical 128-bit processor. It contains 26 instructions, but no directly exposed registers.
Name Mnemonic
Push push
Pop pop
Duplicate dup
Add add
Difference diff
Multiply mul
Divide div
Remainder rem
Bitwise And and
Bitwise Or or
Bitwise Xor xor
Shift Right shr
Shift Left shl
Name (cont.) Mnemonic (cont.)
Branch Equal beq
Branch Unequal bne
Branch Greater bgt
Branch Lesser blt
Function Start fun
Call call
Return ret
Loop loop
Iterate Loop iter
Begin Execution exe
Store sto
Load lod
Stack Length len

The Three Stacks

The stack is an essential part of any assembly language, however it is often a point of vulnerability. Stack-smashing and stack-overflow vulnerabilities can very easily allow attackers to overwright function return addresses, and and execute arbitrary code. This is why Forte uses three distinct stacks.

The Program Stack

The program stack, or p-stack for short, contains all of the state for the user-space program. When you push or pop a value in Forte, you are interacting with the p-stack. What's most interesting about the p-stack is what it does not contain: pointers. There is no way to jump to an address on the p-stack, as its contents are considered untrusted.

The Control Stack

In order to call functions, return addresses must be stored somewhere. While most assembly languages place these on the stack alongside other values, Forte segregates them into a separate control stack (c-stack for short.) This is similar to the "shadow stack" option some compilers use, except implemented at a hardware level. The only way to interact with the c-stack is through call and return instructions.

The Function Stack

Most assembly languages have a "jump" instruction which moves the program counter to an arbitrary point in memory. This is useful but not particularly safe. Forte only allows jumping to valid functions. The locations of these functions are kept in the function stack (f-stack.) Functions are added to the f-stack after they are validated during the warmup phase, which I will discuss next.

Warmup

When a Forte program begins, it does not immediately start executing instructions. Instead, it validates the programs correctness and builds the f-stack. Every fun (function start) instruction will push the address of that instruction to the f-stack. Every instruction that interacts with the stack will increment or decrement an internal register accordingly. If this value value falls below zero or above the maximum stack size, then the program is determined to be unsafe and execution is cancelled before it begins. The warmup phase will take O(N) time, where N is the number of instructions.

Recital

After the exe (begin execution) instruction is reached, the Recital phase begins. The function pointer returns to the address at the top of the f-stack, which was the last defined function. When the program counter reaches the exe again, the program has terminated successfully. This combination of design decisions means that the program counter can never be lower than the first function address, or larger than the execution instruction. This makes arbitrary code execution attacks more difficult to pull off.

Safety

Creating a memory safe assembly language presents much different challenges from a compiled language like Rust. There are no compile-time checks or guarantees, any string of bytes could be interpreted as a "program." The core design principle of Forte is to make unsecure programs impossible (or at least very difficult) to express, and to make static analysis simple. This is a delicate balance to strike - limiting Forte's capabilities necessarily makes it more cumbersome and less performant.