Forte

Programming Language - Hackathon

Motivation

In February of 2024, I competed at RowdyHacks during the 9th annual Hackathon. This was my second time attending, having placed second the previous year . My understanding of programming and of Rust had grown significantly over that time, and I wanted to one-up myself. I had just finished my Computer Organization class, where we learned to program in x86 assembly. The class made me realize that the design of an assembly language has a profound impact on every higher level language built on-top of it. As a Rust programmer who cares a great deal about safety and soundness guarantees, it troubles me that I have to compile to an inherently unsafe assembly language. I wanted to find out how feasible it would be to write a new assembly language designed for easy static analysis and safety guarantees. I am very satisfied with the result, which I named Forte.

Specifications

Instructions

Forte is an assembly language and bytecode for a hypothetical 128-bit processor. It contains 26 instructions, but no directly exposed registers.
Name Mnemonic
Push push
Pop pop
Duplicate dup
Add add
Difference diff
Multiply mul
Divide div
Remainder rem
Bitwise And and
Bitwise Or or
Bitwise Xor xor
Shift Right shr
Shift Left shl
Name (cont.) Mnemonic (cont.)
Branch Equal beq
Branch Unequal bne
Branch Greater bgt
Branch Lesser blt
Function Start fun
Call call
Return ret
Loop loop
Iterate Loop iter
Begin Execution exe
Store sto
Load lod
Stack Length len

The Three Stacks

The stack is an essential part of any assembly language, however it is often a point of vulnerability. Stack-smashing and stack-overflow vulnerabilities can very easily allow attackers to overwright function return addresses, and and execute arbitrary code. This is why Forte uses three distinct stacks.

The Program Stack

The program stack, or p-stack for short, contains all of the state for the user-space program. When you push or pop a value in Forte, you are interacting with the p-stack. What's most interesting about the p-stack is what it does not contain: pointers. There is no way to jump to an address on the p-stack, as its contents are considered untrusted.

The Control Stack

In order to call functions, return addresses must be stored somewhere. While most assembly languages place these on the stack alongside other values, Forte segregates them into a separate control stack (c-stack for short.) This is similar to the "shadow stack" option some compilers use, except implemented at a hardware level. The only way to interact with the c-stack is through call and return instructions.

The Function Stack

Most assembly languages have a "jump" instruction which moves the program counter to an arbitrary point in memory. This is useful but not particularly safe. Forte only allows jumping to valid functions. The locations of these functions are kept in the function stack (f-stack.) Functions are added to the f-stack after they are validated during the warmup phase, which I will discuss next.

Warmup

When a Forte program begins, it does not immediately start executing instructions. Instead, it validates the programs correctness and builds the f-stack. Every fun (function start) instruction will push the address of that instruction to the f-stack. Every instruction that interacts with the stack will increment or decrement an internal register accordingly. If this value value falls below zero or above the maximum stack size, then the program is determined to be unsafe and execution is cancelled before it begins. The warmup phase will take O(N) time, where N is the number of instructions.

Recital

After the exe (begin execution) instruction is reached, the Recital phase begins. The function pointer returns to the address at the top of the f-stack, which was the last defined function. When the program counter reaches the exe again, the program has terminated successfully. This combination of design decisions means that the program counter can never be lower than the first function address, or larger than the execution instruction. This makes arbitrary code execution attacks more difficult to pull off.

Safety

Creating a memory safe assembly language presents much different challenges from a compiled language like Rust. There are no compile-time checks or guarantees, any string of bytes could be interpreted as a "program." The core design principle of Forte is to make unsecure programs impossible (or at least very difficult) to express, and to make static analysis simple. This is a delicate balance to strike - limiting Forte's capabilities necessarily makes it more cumbersome and less performant. I believe we are at a point in history where making this trade-off is the right move. Modern processors are unbelievably fast, security concerns far outweigh performance concerns in most use cases. Writing machine assembly by hand is far less common today, and so the ergonomics of an assembly language are also less of a concern.

Limitations

Forte is the first programming language I've written, and it was hacked together in less than 24 hours. It is certainly not production ready, and not every feature is implemented yet. Other features are technically functional, but exist as mostly placeholders for when I can find a better way to implement them.

Evaluating branching and looping code during the Warmup is very difficult to achieve. I very quickly run into the Halting Problem, and feasibility limits where I to try and implement this sort of algorithm on real hardware. Solutions I've considered are requiring code branches to have the same net effect on stack size, forcing branches to execute another function rather than exist inline, replacing loops with some form of recursion, or outright removing branches and loops altogether. Without solving this problem, it is impossible to make the Warmup phase do its job, and so it is mostly vestigial. As of now, Forte is only really capable of simple linear programs.

Try it Out