finished on build systems

This commit is contained in:
Logan 2024-09-15 00:24:07 -05:00
parent 9f0ab19962
commit 3af7a3d1f6
2 changed files with 121 additions and 53 deletions

View file

@ -103,8 +103,8 @@ figcaption {
line-height: 110%;
}
img[src],iframe,code {
border-radius: 0.25em;
img,iframe,code {
border-radius: 0.4em;
}
figure img {
@ -112,6 +112,7 @@ figure img {
width: 100%
}
form {
border: dashed black;
}

View file

@ -1,36 +1,47 @@
# on build systems
Recently I have been thinking about what makes for good build system. I
want to analyze the major pain points I have encountered building software,
and identify where these systems go wrong. To do this I have picked several
languages I am already familiar with to use as case studies
## definitions
and identify where these systems go wrong. Seeing as C is one of the most
frustrating languages to build for in my experience, I will use the language
as a case study
## definition
I think of build systems as a very broad category of software; the goal of
which is to automate the process of packaging or executing other software.
This typically involves several subtasks. Resolving dependencies, compiling,
interpreting, linking, deploying, packaging, executing - software that does one
or more of these things counts as a build system in my book
which is to automate the process of building other software.
This typically involves several tasks, mainly:
* linting
* formatting
* interpreting
* compiling
* linking
* packaging
* deploying
* executing
Software that performs one or more of these tasks is a build tool. Any number of
build tools can make up a build system
## c and c++
The C family of languages has a quite complicated ecosystem of competing build
systems. To start with, there are the compilers themselves:
[GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical
invocation of looks like this:
invocation of either looks like this:
```bash
cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram
```
This is quite verbose as far as build commands go. The path of every source file
must be specified, as well as separate folders for library headers and object
files. Most software also makes use of numerous compiler flags, most of which
have incredibly cryptic names.
This is quite verbose as far as CLI tools go. The path of every source file
must be specified, as well as the location of libraries to be linked. Path
variables also play a role in the linking process, adding a layer of hidden
complexity Most software also makes use of numerous compiler flags, all of
which must be typed every time.
To compile a C project using only the compiler requires first learning
its structure, wrangling each of its dependencies manually, and reading
documentation to find the appropriate build flags for your platform. For any
non-trivial program, simply typing the build command becomes a challenge
documentation to find the appropriate build flags for your platform. This
is not a reasonable ask for any developer
## make
The problems introduced by the C family of compilers have proven intractable,
and so another layer of abstraction is necessary. Makefile is a rudimentary
scripting language primarily used to build C family languages.
Compiler developers have decided that these problems are out of scope, and
so another layer of abstraction is necessary. Make is a rudimentary scripting
language primarily used to build C projects.
```make
# Example Makefile taken from:
# https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
@ -53,36 +64,36 @@ program.
As software becomes more complex, so too does the task of building it. The
limitations of C make this problem particularly egregious, given its fragile
dependency resolution and lack of meta-programming. Makefiles are an attempt to
bridge this gap, and are a Turing-Complete language in their own right.
dependency resolution and lack of meta-programming. Make is an attempt to
bridge this gap, and is a Turing-Complete language in its own right.
The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
is over 2000 lines as of writing. The massive demands placed on this
intermediary language have exposed its weak points, mainly that it is
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
Makefiles contributes to the difficulty of building software almost as much as
it reduces
it helps
## cmake
Just as Makefiles abstract away the complexity of compiling C, CMake abstracts
away the complexity of creating Makefiles. CMake is a great example of what
happens to software development when there are no adults in the room, so to
speak. Compiling a C program should be a simple task, ideally one that requires
nothing more than a C compiler. Failing that, a simple build scripting language
should be more than enough to handle even industrial use cases. When our build
system needs a build system, we have completely lost the plot and need to
reevaluate the problem from square one
Just as Make acts as an abstraction over C compilers, CMake acts as an
abstraction over Makefiles. CMake is a great example of what happens to software
development when there are no adults in the room, so to speak. Compiling a C
program should be a simple task, ideally one that requires nothing more than a
C compiler. Failing that, a simple build scripting language should be more than
enough to handle even industrial use cases. When our build system needs a build
system, we have completely lost the plot and need to reevaluate the problem from
square one
```
CMake -> Makefile -> gcc/clang -> Assembly
```
## compile targets
Imagine a world where the Makefile language was more expressive, functional, and
Imagine a world where the Make language was more expressive, functional, and
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
another language into the mix would only slow down development and introduce an
entirely new category of bugs. CMake can only exist because Makefile failed to
entirely new category of bugs. CMake can only exist because Make failed to
accomplish its goal. The same could be said for the GCC and Clang compilation
syntax. Rather than fix the underlying issue, we treat the failed product as a
new compile target and build a new thing to abstract away (never replace!) the
new compilation target and build a new thing to abstract away (never replace!) the
old thing.
Developers are not (generally) stupid; this pattern exists for a reason. In the
@ -90,8 +101,8 @@ case of C, it is sometimes necessary to execute arbitrary code at build-time.
The obvious solution is to create a new language to handle this need - but why
is the original language not sufficient? Make is written in C, so by definition
C can do anything Make can do. The issue is that C source files do not
contain enough information for the compiler to build the entire package. This
information must be embedded in another, nonstandard format, which itself must
contain enough information for the compiler to build the entire program. This
information must be embedded in another nonstandard format, which itself must
be parsed and executed by a nonstandard build tool
## shebang
@ -118,8 +129,8 @@ describes how to run itself to the shell which invokes it. Since the `#`
character is used as a comment in Python, the line can be safely ignored by
any other programs or tools that read the file. This system is not perfect, the
name or path of the python executable may vary between systems, and the shebang
relies on a shell to interpret it (technically a build system). Expanding on
this concept may help alleviate our build system woes.
relies on a shell to interpret it (technically a build system). However,
expanding on this concept may help alleviate our build system woes
## doing better
Let's look at how C syntax could be changed to adopt some of these ideals,
@ -161,7 +172,7 @@ int main(int argc, char* argv[]) {
if (strcmp("gcc", compiler.name) != 0
|| compiler.semver.major <= 12) {
// Abort build with error
emit_error("Incompatible compiler version!\n");
emit_error("Incompatible compiler version!");
return 1;
}
@ -189,27 +200,83 @@ The `compile.h` and `link.h` includes are compiler implemented, and so do not
need to be linked from the system's libc. All flags passed to the compiler are
handed off to the `main()` function. It is easy to imagine an
equivalent to `make clean` that erases all build artifacts, or a caching system
that only rebuilds modified files.
that only rebuilds modified files
## going further
I am not a C developer by any stretch, and so I will spare you any more
pseudocode. I hope these examples show that replacing Makefiles with pure
C is not such an unreasonable idea. Still, we can go even further; imagine
if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc
bytecode) conversion functions. This would make meta-programming simple and
if we split the `compile()` function into lexing, parsing, and IR generating
intermediary functions. This would make meta-programming simple and
straightforward, and even allow for the introduction of program-specific syntax.
Developers could create libraries for common build tasks such as cloning dependency
git repositories, running tests, or submitting binaries to package managers.
Developers could create libraries for common build tasks such as cloning git
repositories, running tests, or submitting binaries to package managers
## disadvantages
## setbacks
Comparing my pseudocode to the Makefile example, it is obvious which is more
idiomatic and understandable. This is partially due to my lack of creativity
and skill as a C developer. However, I imagine Make will always have an advantage
here, at least when it comes to small projects. Even worse, on closer examination
we have not yet achieved the goal of in-lining build information. While our build
system is now in C, and does not rely on external tools, it is still a separate
entity from the project itself. The minimum possible C program is now two files
rather than just one. So far I have tried to conform as closely as possible to
standard C syntax and grammar, but this approach will always feel more like
a hack than a well-thought-out language feature
idiomatic and understandable. This is partially due to my lack of creativity and
skill as a C developer. However, I imagine Make will always have an advantage
here, at least when it comes to small projects. While our build system is
now in C, it is still a separate entity from the project itself. The minimum
possible C program is now two files rather than just one. So far I have tried
to conform as closely as possible to standard C syntax and grammar, but this
approach will always feel like a hack more than a well-thought-out language
feature
## the ouroboros
Most languages draw a very strong distinction between compile-time and run-time
code. Typically, compile-time execution may happen only within macros or
constant functions, if it is even allowed at all. This habit can be traced back
to assembly programmers who deemed self-modifying code a dangerous antipattern.
This mindset is what I believe drives us to create these leaning towers of build
systems.
What would a language built around meta-programming look like? I suspect that
a language with a truly infinite degree of self reflection is possible. Such a
language could be far more expressive than its peers using less syntax. Imagine
if a library could implement a new language-wide keyword, and even implement
that keyword using the same keyword. Perhaps concepts as basic as structs,
enumerated types, and integers could be defined within the language itself. The
line between compilation and execution disappears. The line between language
and program grows thin. I imagine this process as if the language were eating
itself, like an ouroboros.
<section>
<a href="https://commons.wikimedia.org/wiki/File:Serpiente_alquimica.jpg">
<img
width="50%"
src="https://upload.wikimedia.org/wikipedia/commons/7/71/Serpiente_alquimica.jpg">
</a>
</section>
If such a language existed, it follows that every other language would simply
be a strict subset of this language (lets call it "every-lang"). For example,
we could write an every-lang library which implements every piece of
[Lua](https://www.lua.org/) syntax and grammar on a meta-program level. A user
that imports this library could then simply write code in Lua, and compile the
program using the every-lang compiler. This library would effectively be a Lua
build system, that is also an every-lang build system, that is also an "every
language" build system
```lua
-- A Lua program? or an every-lang program?
require "everylang"
function fact(n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
print(fact(5))
```
## on crashing and burning
This every-lang is, to put it lightly, a little far-fetched. Such a language
would be nearly impossible to implement or reason about. A practically useful
language must make compromises with its users, and the fundamental laws of
computation. I believe that the next frontier for language design will be
pushing the boundary on this front - how close can we get to every-lang without
crashing and burning?