finished on build systems

This commit is contained in:
Logan 2024-09-15 00:24:07 -05:00
parent 9f0ab19962
commit 3af7a3d1f6
2 changed files with 121 additions and 53 deletions

View file

@ -103,8 +103,8 @@ figcaption {
line-height: 110%; line-height: 110%;
} }
img[src],iframe,code { img,iframe,code {
border-radius: 0.25em; border-radius: 0.4em;
} }
figure img { figure img {
@ -112,6 +112,7 @@ figure img {
width: 100% width: 100%
} }
form { form {
border: dashed black; border: dashed black;
} }

View file

@ -1,36 +1,47 @@
# on build systems # on build systems
Recently I have been thinking about what makes for good build system. I Recently I have been thinking about what makes for good build system. I
want to analyze the major pain points I have encountered building software, want to analyze the major pain points I have encountered building software,
and identify where these systems go wrong. To do this I have picked several and identify where these systems go wrong. Seeing as C is one of the most
languages I am already familiar with to use as case studies frustrating languages to build for in my experience, I will use the language
## definitions as a case study
## definition
I think of build systems as a very broad category of software; the goal of I think of build systems as a very broad category of software; the goal of
which is to automate the process of packaging or executing other software. which is to automate the process of building other software.
This typically involves several subtasks. Resolving dependencies, compiling, This typically involves several tasks, mainly:
interpreting, linking, deploying, packaging, executing - software that does one * linting
or more of these things counts as a build system in my book * formatting
* interpreting
* compiling
* linking
* packaging
* deploying
* executing
Software that performs one or more of these tasks is a build tool. Any number of
build tools can make up a build system
## c and c++ ## c and c++
The C family of languages has a quite complicated ecosystem of competing build The C family of languages has a quite complicated ecosystem of competing build
systems. To start with, there are the compilers themselves: systems. To start with, there are the compilers themselves:
[GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical [GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical
invocation of looks like this: invocation of either looks like this:
```bash ```bash
cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram
``` ```
This is quite verbose as far as build commands go. The path of every source file This is quite verbose as far as CLI tools go. The path of every source file
must be specified, as well as separate folders for library headers and object must be specified, as well as the location of libraries to be linked. Path
files. Most software also makes use of numerous compiler flags, most of which variables also play a role in the linking process, adding a layer of hidden
have incredibly cryptic names. complexity Most software also makes use of numerous compiler flags, all of
which must be typed every time.
To compile a C project using only the compiler requires first learning To compile a C project using only the compiler requires first learning
its structure, wrangling each of its dependencies manually, and reading its structure, wrangling each of its dependencies manually, and reading
documentation to find the appropriate build flags for your platform. For any documentation to find the appropriate build flags for your platform. This
non-trivial program, simply typing the build command becomes a challenge is not a reasonable ask for any developer
## make ## make
The problems introduced by the C family of compilers have proven intractable, Compiler developers have decided that these problems are out of scope, and
and so another layer of abstraction is necessary. Makefile is a rudimentary so another layer of abstraction is necessary. Make is a rudimentary scripting
scripting language primarily used to build C family languages. language primarily used to build C projects.
```make ```make
# Example Makefile taken from: # Example Makefile taken from:
# https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/ # https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
@ -53,36 +64,36 @@ program.
As software becomes more complex, so too does the task of building it. The As software becomes more complex, so too does the task of building it. The
limitations of C make this problem particularly egregious, given its fragile limitations of C make this problem particularly egregious, given its fragile
dependency resolution and lack of meta-programming. Makefiles are an attempt to dependency resolution and lack of meta-programming. Make is an attempt to
bridge this gap, and are a Turing-Complete language in their own right. bridge this gap, and is a Turing-Complete language in its own right.
The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile) The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
is over 2000 lines as of writing. The massive demands placed on this is over 2000 lines as of writing. The massive demands placed on this
intermediary language have exposed its weak points, mainly that it is intermediary language have exposed its weak points, mainly that it is
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
Makefiles contributes to the difficulty of building software almost as much as Makefiles contributes to the difficulty of building software almost as much as
it reduces it helps
## cmake ## cmake
Just as Makefiles abstract away the complexity of compiling C, CMake abstracts Just as Make acts as an abstraction over C compilers, CMake acts as an
away the complexity of creating Makefiles. CMake is a great example of what abstraction over Makefiles. CMake is a great example of what happens to software
happens to software development when there are no adults in the room, so to development when there are no adults in the room, so to speak. Compiling a C
speak. Compiling a C program should be a simple task, ideally one that requires program should be a simple task, ideally one that requires nothing more than a
nothing more than a C compiler. Failing that, a simple build scripting language C compiler. Failing that, a simple build scripting language should be more than
should be more than enough to handle even industrial use cases. When our build enough to handle even industrial use cases. When our build system needs a build
system needs a build system, we have completely lost the plot and need to system, we have completely lost the plot and need to reevaluate the problem from
reevaluate the problem from square one square one
``` ```
CMake -> Makefile -> gcc/clang -> Assembly CMake -> Makefile -> gcc/clang -> Assembly
``` ```
## compile targets ## compile targets
Imagine a world where the Makefile language was more expressive, functional, and Imagine a world where the Make language was more expressive, functional, and
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
another language into the mix would only slow down development and introduce an another language into the mix would only slow down development and introduce an
entirely new category of bugs. CMake can only exist because Makefile failed to entirely new category of bugs. CMake can only exist because Make failed to
accomplish its goal. The same could be said for the GCC and Clang compilation accomplish its goal. The same could be said for the GCC and Clang compilation
syntax. Rather than fix the underlying issue, we treat the failed product as a syntax. Rather than fix the underlying issue, we treat the failed product as a
new compile target and build a new thing to abstract away (never replace!) the new compilation target and build a new thing to abstract away (never replace!) the
old thing. old thing.
Developers are not (generally) stupid; this pattern exists for a reason. In the Developers are not (generally) stupid; this pattern exists for a reason. In the
@ -90,8 +101,8 @@ case of C, it is sometimes necessary to execute arbitrary code at build-time.
The obvious solution is to create a new language to handle this need - but why The obvious solution is to create a new language to handle this need - but why
is the original language not sufficient? Make is written in C, so by definition is the original language not sufficient? Make is written in C, so by definition
C can do anything Make can do. The issue is that C source files do not C can do anything Make can do. The issue is that C source files do not
contain enough information for the compiler to build the entire package. This contain enough information for the compiler to build the entire program. This
information must be embedded in another, nonstandard format, which itself must information must be embedded in another nonstandard format, which itself must
be parsed and executed by a nonstandard build tool be parsed and executed by a nonstandard build tool
## shebang ## shebang
@ -118,8 +129,8 @@ describes how to run itself to the shell which invokes it. Since the `#`
character is used as a comment in Python, the line can be safely ignored by character is used as a comment in Python, the line can be safely ignored by
any other programs or tools that read the file. This system is not perfect, the any other programs or tools that read the file. This system is not perfect, the
name or path of the python executable may vary between systems, and the shebang name or path of the python executable may vary between systems, and the shebang
relies on a shell to interpret it (technically a build system). Expanding on relies on a shell to interpret it (technically a build system). However,
this concept may help alleviate our build system woes. expanding on this concept may help alleviate our build system woes
## doing better ## doing better
Let's look at how C syntax could be changed to adopt some of these ideals, Let's look at how C syntax could be changed to adopt some of these ideals,
@ -161,7 +172,7 @@ int main(int argc, char* argv[]) {
if (strcmp("gcc", compiler.name) != 0 if (strcmp("gcc", compiler.name) != 0
|| compiler.semver.major <= 12) { || compiler.semver.major <= 12) {
// Abort build with error // Abort build with error
emit_error("Incompatible compiler version!\n"); emit_error("Incompatible compiler version!");
return 1; return 1;
} }
@ -189,27 +200,83 @@ The `compile.h` and `link.h` includes are compiler implemented, and so do not
need to be linked from the system's libc. All flags passed to the compiler are need to be linked from the system's libc. All flags passed to the compiler are
handed off to the `main()` function. It is easy to imagine an handed off to the `main()` function. It is easy to imagine an
equivalent to `make clean` that erases all build artifacts, or a caching system equivalent to `make clean` that erases all build artifacts, or a caching system
that only rebuilds modified files. that only rebuilds modified files
## going further ## going further
I am not a C developer by any stretch, and so I will spare you any more I am not a C developer by any stretch, and so I will spare you any more
pseudocode. I hope these examples show that replacing Makefiles with pure pseudocode. I hope these examples show that replacing Makefiles with pure
C is not such an unreasonable idea. Still, we can go even further; imagine C is not such an unreasonable idea. Still, we can go even further; imagine
if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc if we split the `compile()` function into lexing, parsing, and IR generating
bytecode) conversion functions. This would make meta-programming simple and intermediary functions. This would make meta-programming simple and
straightforward, and even allow for the introduction of program-specific syntax. straightforward, and even allow for the introduction of program-specific syntax.
Developers could create libraries for common build tasks such as cloning dependency Developers could create libraries for common build tasks such as cloning git
git repositories, running tests, or submitting binaries to package managers. repositories, running tests, or submitting binaries to package managers
## disadvantages ## setbacks
Comparing my pseudocode to the Makefile example, it is obvious which is more Comparing my pseudocode to the Makefile example, it is obvious which is more
idiomatic and understandable. This is partially due to my lack of creativity idiomatic and understandable. This is partially due to my lack of creativity and
and skill as a C developer. However, I imagine Make will always have an advantage skill as a C developer. However, I imagine Make will always have an advantage
here, at least when it comes to small projects. Even worse, on closer examination here, at least when it comes to small projects. While our build system is
we have not yet achieved the goal of in-lining build information. While our build now in C, it is still a separate entity from the project itself. The minimum
system is now in C, and does not rely on external tools, it is still a separate possible C program is now two files rather than just one. So far I have tried
entity from the project itself. The minimum possible C program is now two files to conform as closely as possible to standard C syntax and grammar, but this
rather than just one. So far I have tried to conform as closely as possible to approach will always feel like a hack more than a well-thought-out language
standard C syntax and grammar, but this approach will always feel more like feature
a hack than a well-thought-out language feature
## the ouroboros
Most languages draw a very strong distinction between compile-time and run-time
code. Typically, compile-time execution may happen only within macros or
constant functions, if it is even allowed at all. This habit can be traced back
to assembly programmers who deemed self-modifying code a dangerous antipattern.
This mindset is what I believe drives us to create these leaning towers of build
systems.
What would a language built around meta-programming look like? I suspect that
a language with a truly infinite degree of self reflection is possible. Such a
language could be far more expressive than its peers using less syntax. Imagine
if a library could implement a new language-wide keyword, and even implement
that keyword using the same keyword. Perhaps concepts as basic as structs,
enumerated types, and integers could be defined within the language itself. The
line between compilation and execution disappears. The line between language
and program grows thin. I imagine this process as if the language were eating
itself, like an ouroboros.
<section>
<a href="https://commons.wikimedia.org/wiki/File:Serpiente_alquimica.jpg">
<img
width="50%"
src="https://upload.wikimedia.org/wikipedia/commons/7/71/Serpiente_alquimica.jpg">
</a>
</section>
If such a language existed, it follows that every other language would simply
be a strict subset of this language (lets call it "every-lang"). For example,
we could write an every-lang library which implements every piece of
[Lua](https://www.lua.org/) syntax and grammar on a meta-program level. A user
that imports this library could then simply write code in Lua, and compile the
program using the every-lang compiler. This library would effectively be a Lua
build system, that is also an every-lang build system, that is also an "every
language" build system
```lua
-- A Lua program? or an every-lang program?
require "everylang"
function fact(n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
print(fact(5))
```
## on crashing and burning
This every-lang is, to put it lightly, a little far-fetched. Such a language
would be nearly impossible to implement or reason about. A practically useful
language must make compromises with its users, and the fundamental laws of
computation. I believe that the next frontier for language design will be
pushing the boundary on this front - how close can we get to every-lang without
crashing and burning?