work on build systems writup

This commit is contained in:
Logan 2024-09-14 21:48:48 -05:00
parent 486cbd5b9a
commit 6244a86a61
2 changed files with 166 additions and 13 deletions

View file

@ -23,6 +23,7 @@ body {
max-width: 40em; max-width: 40em;
display: flex; display: flex;
margin: auto; margin: auto;
tab-size: 2;
} }
main { main {
@ -35,13 +36,19 @@ main {
font-size: 1.2em; font-size: 1.2em;
} }
p {
text-align: justify;
hyphens: auto;
}
code { code {
display: inline-block; display: inline-block;
font-family: cascadia; font-family: cascadia;
font-size: 0.9em; font-size: 0.9em;
background: #1e1e22; background: #1e1e22;
color: #dddde1; color: #dddde1;
padding: 0.1em 0.5em 0.1em; padding: 0.1em;
margin: 0.1em 0em 0.1em;
} }
table { table {
@ -83,7 +90,7 @@ section {
text-align: center; text-align: center;
} }
section h2,p { section h2,section p {
text-align: left; text-align: left;
} }

View file

@ -45,17 +45,17 @@ OBJ=hellomake.o hellofunc.o
hellomake: $(OBJ) hellomake: $(OBJ)
$(CC) -o $@ $^ $(CFLAGS) $(CC) -o $@ $^ $(CFLAGS)
``` ```
Makefiles attempt to abstract away the complexity of the C compilation process. Make attempt to abstract away the complexity of the C compilation process.
Variables and pattern matching of file names are particularly well suited for Variables and pattern matching of file names are particularly well suited for
managing compiler flags and object files. However, by far the most attractive managing compiler flags and object files. However, by far the most attractive
feature of Makefiles is the ability to simply type `make` to compile the entire feature of Make is the ability to simply type `make` to compile the entire
program. program.
As software becomes more complex, so too does the task of building it. The As software becomes more complex, so too does the task of building it. The
limitations of C make this problem particularly egregious, given its fragile limitations of C make this problem particularly egregious, given its fragile
dependency resolution and lack of meta-programming. Makefiles have attempted dependency resolution and lack of meta-programming. Makefiles are an attempt to
to bridge this gap, and are a Turing-Complete language in their own right. bridge this gap, and are a Turing-Complete language in their own right.
The [Makefile which builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile) The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
is over 2000 lines as of writing. The massive demands placed on this is over 2000 lines as of writing. The massive demands placed on this
intermediary language have exposed its weak points, mainly that it is intermediary language have exposed its weak points, mainly that it is
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
@ -63,7 +63,153 @@ Makefiles contributes to the difficulty of building software almost as much as
it reduces it reduces
## cmake ## cmake
When I first learned about CMake and what it does, I actually laughed out loud. Just as Makefiles abstract away the complexity of compiling C, CMake abstracts
I am lucky enough to have never written a CMake file, so this section will away the complexity of creating Makefiles. CMake is a great example of what
be brief. Just as Makefiles abstract away the complexity of building C, CMake happens to software development when there are no adults in the room, so to
abstracts away the complexity of building Makefiles. speak. Compiling a C program should be a simple task, ideally one that requires
nothing more than a C compiler. Failing that, a simple build scripting language
should be more than enough to handle even industrial use cases. When our build
system needs a build system, we have completely lost the plot and need to
reevaluate the problem from square one
```
CMake -> Makefile -> gcc/clang -> Assembly
```
## compile targets
Imagine a world where the Makefile language was more expressive, functional, and
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
another language into the mix would only slow down development and introduce an
entirely new category of bugs. CMake can only exist because Makefile failed to
accomplish its goal. The same could be said for the GCC and Clang compilation
syntax. Rather than fix the underlying issue, we treat the failed product as a
new compile target and build a new thing to abstract away (never replace!) the
old thing.
Developers are not (generally) stupid; this pattern exists for a reason. In the
case of C, it is sometimes necessary to execute arbitrary code at build-time.
The obvious solution is to create a new language to handle this need - but why
is the original language not sufficient? Make is written in C, so by definition
C can do anything Make can do. The issue is that C source files do not
contain enough information for the compiler to build the entire package. This
information must be embedded in another, nonstandard format, which itself must
be parsed and executed by a nonstandard build tool
## shebang
Developers have become overly complacent with build systems. Look at any project
today, and in the root directory you will see a layer of congealed fat:
`package.json`, `CMakeLists.txt`, `Cargo.toml`, `build.gradle`, maybe a python
virtual environment, along with any ignore files, linter configs, etc... Every
new tool, language, and config file means another program to install and another
step in the build process. Every one of these dependencies makes the project
more fragile and less portable. Meanwhile, we are not making good use of the
tools we already have. We ought to be demanding more from language designers.
The build process should not be an afterthought left for developers to figure
out, it should be a core consideration when designing grammar and syntax.
If you have ever used a scripting language, you are probably familiar with the
shebang line.
```python
#!/usr/bin/env python3
print("Hello World!")
```
This wonderfully useful one-liner captures what I mean by making use of existing
tools, and treating the build process as a grammar concern. This Python file
describes how to run itself to the shell which invokes it. Since the `#`
character is used as a comment in Python, the line can be safely ignored by
any other programs or tools that read the file. This system is not perfect, the
name or path of the python executable may vary between systems, and the shebang
relies on a shell to interpret it (technically a build system). Expanding on
this concept may help alleviate our build system woes.
## doing better
Let's look at how C syntax could be changed to adopt some of these ideals,
starting with a simple example:
```c
#!/usr/bin/gcc -E
// Warn or error if specific compiler not used
#compiler gcc 12.2.0
#semver 1.0.0
#ifdef RELEASE
#opt o2
#endif
#warn all
#libs lib/
#output buid/MyProgram
// Warn or error if library semver does not match
#include "mylib.h" "0.2.*"
int main() {
/* ... */
return 0;
}
```
Here, I have replaced command-line flags with a special `#` prefixed syntax.
Since all the compiler directives are in-line with the source code itself,
we can take advantage of the shebang line just like scripting languages do.
Using the `#ifdef` directive, we can even conditionally enable flags for
release mode. Let's see what we can do with an even more radical approach:
```c
// build.c
#include <compile.h>
#include <link.h>
#define RELEASE 0
int main(int argc, char* argv[]) {
// Struct representing the invoked compiler
compiler_t compiler = get_compiler();
if (strcmp("gcc", compiler.name) != 0
|| compiler.semver.major <= 12) {
// Abort build with error
emit_error("Incompatible compiler version!\n");
return 1;
}
semver_t version = {.major=1, .minor=0, .patch=0};
int opt_level;
// We could easily check for a flag
// in argv here instead
if (RELEASE) {
opt_level = 2;
} else {
opt_level = 0;
}
// A realistic function would probably take
// some structure containing compile directives
artifact_t executable = compile(
&compiler, "main.c", opt_level, version
);
artifact_t mylib = load_dylib("lib/mylib.so");
link(&executable, &mylib);
write_artifact(&executable, "bin/MyProgram");
}
```
In this example, we create a new file `build.c` which acts as a pseudo-Makefile.
The `compile.h` and `link.h` includes are compiler implemented, and so do not
need to be linked from the system's libc. All flags passed to the compiler are
handed off to the `main()` function. It is easy to imagine an
equivalent to `make clean` that erases all build artifacts, or a caching system
that only rebuilds modified files.
## going further
I am not a C developer by any stretch, and so I will spare you any more
pseudocode. I hope these examples show that replacing Makefiles with pure
C is not such an unreasonable idea. Still, we can go even further; imagine
if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc
bytecode) conversion functions. This would make meta-programming simple and
straightforward, and even allow for the introduction of program-specific syntax.
Developers could create libraries for common build tasks such as cloning dependency
git repositories, running tests, or submitting binaries to package managers.
## disadvantages
Comparing my pseudocode to the Makefile example, it is obvious which is more
idiomatic and understandable. This is partially due to my lack of creativity
and skill as a C developer. However, I imagine Make will always have an advantage
here, at least when it comes to small projects. Even worse, on closer examination
we have not yet achieved the goal of in-lining build information. While our build
system is now in C, and does not rely on external tools, it is still a separate
entity from the project itself. The minimum possible C program is now two files
rather than just one. So far I have tried to conform as closely as possible to
standard C syntax and grammar, but this approach will always feel more like
a hack than a well-thought-out language feature