From 6244a86a616b39b40d40748eaf5c84516445d138 Mon Sep 17 00:00:00 2001 From: Logan Date: Sat, 14 Sep 2024 21:48:48 -0500 Subject: [PATCH] work on build systems writup --- src/css/style.css | 11 ++- src/writings/1-build-systems.md | 168 +++++++++++++++++++++++++++++--- 2 files changed, 166 insertions(+), 13 deletions(-) diff --git a/src/css/style.css b/src/css/style.css index 943c4db..5960e9e 100644 --- a/src/css/style.css +++ b/src/css/style.css @@ -23,6 +23,7 @@ body { max-width: 40em; display: flex; margin: auto; + tab-size: 2; } main { @@ -35,13 +36,19 @@ main { font-size: 1.2em; } +p { + text-align: justify; + hyphens: auto; +} + code { display: inline-block; font-family: cascadia; font-size: 0.9em; background: #1e1e22; color: #dddde1; - padding: 0.1em 0.5em 0.1em; + padding: 0.1em; + margin: 0.1em 0em 0.1em; } table { @@ -83,7 +90,7 @@ section { text-align: center; } -section h2,p { +section h2,section p { text-align: left; } diff --git a/src/writings/1-build-systems.md b/src/writings/1-build-systems.md index 1debe3a..ad4ff5e 100644 --- a/src/writings/1-build-systems.md +++ b/src/writings/1-build-systems.md @@ -45,25 +45,171 @@ OBJ=hellomake.o hellofunc.o hellomake: $(OBJ) $(CC) -o $@ $^ $(CFLAGS) ``` -Makefiles attempt to abstract away the complexity of the C compilation process. +Make attempt to abstract away the complexity of the C compilation process. Variables and pattern matching of file names are particularly well suited for managing compiler flags and object files. However, by far the most attractive -feature of Makefiles is the ability to simply type `make` to compile the entire +feature of Make is the ability to simply type `make` to compile the entire program. -As software becomes more complex, so too does the task of building it. The +As software becomes more complex, so too does the task of building it. The limitations of C make this problem particularly egregious, given its fragile -dependency resolution and lack of meta-programming. Makefiles have attempted -to bridge this gap, and are a Turing-Complete language in their own right. -The [Makefile which builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile) +dependency resolution and lack of meta-programming. Makefiles are an attempt to +bridge this gap, and are a Turing-Complete language in their own right. +The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile) is over 2000 lines as of writing. The massive demands placed on this intermediary language have exposed its weak points, mainly that it is -stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex +stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex Makefiles contributes to the difficulty of building software almost as much as it reduces ## cmake -When I first learned about CMake and what it does, I actually laughed out loud. -I am lucky enough to have never written a CMake file, so this section will -be brief. Just as Makefiles abstract away the complexity of building C, CMake -abstracts away the complexity of building Makefiles. +Just as Makefiles abstract away the complexity of compiling C, CMake abstracts +away the complexity of creating Makefiles. CMake is a great example of what +happens to software development when there are no adults in the room, so to +speak. Compiling a C program should be a simple task, ideally one that requires +nothing more than a C compiler. Failing that, a simple build scripting language +should be more than enough to handle even industrial use cases. When our build +system needs a build system, we have completely lost the plot and need to +reevaluate the problem from square one +``` +CMake -> Makefile -> gcc/clang -> Assembly +``` + +## compile targets +Imagine a world where the Makefile language was more expressive, functional, and +well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing +another language into the mix would only slow down development and introduce an +entirely new category of bugs. CMake can only exist because Makefile failed to +accomplish its goal. The same could be said for the GCC and Clang compilation +syntax. Rather than fix the underlying issue, we treat the failed product as a +new compile target and build a new thing to abstract away (never replace!) the +old thing. + +Developers are not (generally) stupid; this pattern exists for a reason. In the +case of C, it is sometimes necessary to execute arbitrary code at build-time. +The obvious solution is to create a new language to handle this need - but why +is the original language not sufficient? Make is written in C, so by definition +C can do anything Make can do. The issue is that C source files do not +contain enough information for the compiler to build the entire package. This +information must be embedded in another, nonstandard format, which itself must +be parsed and executed by a nonstandard build tool + +## shebang +Developers have become overly complacent with build systems. Look at any project +today, and in the root directory you will see a layer of congealed fat: +`package.json`, `CMakeLists.txt`, `Cargo.toml`, `build.gradle`, maybe a python +virtual environment, along with any ignore files, linter configs, etc... Every +new tool, language, and config file means another program to install and another +step in the build process. Every one of these dependencies makes the project +more fragile and less portable. Meanwhile, we are not making good use of the +tools we already have. We ought to be demanding more from language designers. +The build process should not be an afterthought left for developers to figure +out, it should be a core consideration when designing grammar and syntax. + +If you have ever used a scripting language, you are probably familiar with the +shebang line. +```python +#!/usr/bin/env python3 +print("Hello World!") +``` +This wonderfully useful one-liner captures what I mean by making use of existing +tools, and treating the build process as a grammar concern. This Python file +describes how to run itself to the shell which invokes it. Since the `#` +character is used as a comment in Python, the line can be safely ignored by +any other programs or tools that read the file. This system is not perfect, the +name or path of the python executable may vary between systems, and the shebang +relies on a shell to interpret it (technically a build system). Expanding on +this concept may help alleviate our build system woes. + +## doing better +Let's look at how C syntax could be changed to adopt some of these ideals, +starting with a simple example: +```c +#!/usr/bin/gcc -E +// Warn or error if specific compiler not used +#compiler gcc 12.2.0 +#semver 1.0.0 +#ifdef RELEASE +#opt o2 +#endif +#warn all +#libs lib/ +#output buid/MyProgram +// Warn or error if library semver does not match +#include "mylib.h" "0.2.*" + +int main() { + /* ... */ + return 0; +} +``` +Here, I have replaced command-line flags with a special `#` prefixed syntax. +Since all the compiler directives are in-line with the source code itself, +we can take advantage of the shebang line just like scripting languages do. +Using the `#ifdef` directive, we can even conditionally enable flags for +release mode. Let's see what we can do with an even more radical approach: +```c +// build.c +#include +#include + +#define RELEASE 0 + +int main(int argc, char* argv[]) { + // Struct representing the invoked compiler + compiler_t compiler = get_compiler(); + if (strcmp("gcc", compiler.name) != 0 + || compiler.semver.major <= 12) { + // Abort build with error + emit_error("Incompatible compiler version!\n"); + return 1; + } + + semver_t version = {.major=1, .minor=0, .patch=0}; + int opt_level; + // We could easily check for a flag + // in argv here instead + if (RELEASE) { + opt_level = 2; + } else { + opt_level = 0; + } + // A realistic function would probably take + // some structure containing compile directives + artifact_t executable = compile( + &compiler, "main.c", opt_level, version + ); + artifact_t mylib = load_dylib("lib/mylib.so"); + link(&executable, &mylib); + write_artifact(&executable, "bin/MyProgram"); +} +``` +In this example, we create a new file `build.c` which acts as a pseudo-Makefile. +The `compile.h` and `link.h` includes are compiler implemented, and so do not +need to be linked from the system's libc. All flags passed to the compiler are +handed off to the `main()` function. It is easy to imagine an +equivalent to `make clean` that erases all build artifacts, or a caching system +that only rebuilds modified files. + +## going further +I am not a C developer by any stretch, and so I will spare you any more +pseudocode. I hope these examples show that replacing Makefiles with pure +C is not such an unreasonable idea. Still, we can go even further; imagine +if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc +bytecode) conversion functions. This would make meta-programming simple and +straightforward, and even allow for the introduction of program-specific syntax. +Developers could create libraries for common build tasks such as cloning dependency +git repositories, running tests, or submitting binaries to package managers. + +## disadvantages +Comparing my pseudocode to the Makefile example, it is obvious which is more +idiomatic and understandable. This is partially due to my lack of creativity +and skill as a C developer. However, I imagine Make will always have an advantage +here, at least when it comes to small projects. Even worse, on closer examination +we have not yet achieved the goal of in-lining build information. While our build +system is now in C, and does not rely on external tools, it is still a separate +entity from the project itself. The minimum possible C program is now two files +rather than just one. So far I have tried to conform as closely as possible to +standard C syntax and grammar, but this approach will always feel more like +a hack than a well-thought-out language feature +