finished on build systems
This commit is contained in:
parent
9f0ab19962
commit
3af7a3d1f6
|
@ -103,8 +103,8 @@ figcaption {
|
|||
line-height: 110%;
|
||||
}
|
||||
|
||||
img[src],iframe,code {
|
||||
border-radius: 0.25em;
|
||||
img,iframe,code {
|
||||
border-radius: 0.4em;
|
||||
}
|
||||
|
||||
figure img {
|
||||
|
@ -112,6 +112,7 @@ figure img {
|
|||
width: 100%
|
||||
}
|
||||
|
||||
|
||||
form {
|
||||
border: dashed black;
|
||||
}
|
||||
|
|
|
@ -1,36 +1,47 @@
|
|||
# on build systems
|
||||
Recently I have been thinking about what makes for good build system. I
|
||||
want to analyze the major pain points I have encountered building software,
|
||||
and identify where these systems go wrong. To do this I have picked several
|
||||
languages I am already familiar with to use as case studies
|
||||
## definitions
|
||||
and identify where these systems go wrong. Seeing as C is one of the most
|
||||
frustrating languages to build for in my experience, I will use the language
|
||||
as a case study
|
||||
## definition
|
||||
I think of build systems as a very broad category of software; the goal of
|
||||
which is to automate the process of packaging or executing other software.
|
||||
This typically involves several subtasks. Resolving dependencies, compiling,
|
||||
interpreting, linking, deploying, packaging, executing - software that does one
|
||||
or more of these things counts as a build system in my book
|
||||
which is to automate the process of building other software.
|
||||
This typically involves several tasks, mainly:
|
||||
* linting
|
||||
* formatting
|
||||
* interpreting
|
||||
* compiling
|
||||
* linking
|
||||
* packaging
|
||||
* deploying
|
||||
* executing
|
||||
|
||||
Software that performs one or more of these tasks is a build tool. Any number of
|
||||
build tools can make up a build system
|
||||
## c and c++
|
||||
The C family of languages has a quite complicated ecosystem of competing build
|
||||
systems. To start with, there are the compilers themselves:
|
||||
[GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical
|
||||
invocation of looks like this:
|
||||
invocation of either looks like this:
|
||||
```bash
|
||||
cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram
|
||||
```
|
||||
This is quite verbose as far as build commands go. The path of every source file
|
||||
must be specified, as well as separate folders for library headers and object
|
||||
files. Most software also makes use of numerous compiler flags, most of which
|
||||
have incredibly cryptic names.
|
||||
This is quite verbose as far as CLI tools go. The path of every source file
|
||||
must be specified, as well as the location of libraries to be linked. Path
|
||||
variables also play a role in the linking process, adding a layer of hidden
|
||||
complexity Most software also makes use of numerous compiler flags, all of
|
||||
which must be typed every time.
|
||||
|
||||
To compile a C project using only the compiler requires first learning
|
||||
its structure, wrangling each of its dependencies manually, and reading
|
||||
documentation to find the appropriate build flags for your platform. For any
|
||||
non-trivial program, simply typing the build command becomes a challenge
|
||||
documentation to find the appropriate build flags for your platform. This
|
||||
is not a reasonable ask for any developer
|
||||
|
||||
## make
|
||||
The problems introduced by the C family of compilers have proven intractable,
|
||||
and so another layer of abstraction is necessary. Makefile is a rudimentary
|
||||
scripting language primarily used to build C family languages.
|
||||
Compiler developers have decided that these problems are out of scope, and
|
||||
so another layer of abstraction is necessary. Make is a rudimentary scripting
|
||||
language primarily used to build C projects.
|
||||
```make
|
||||
# Example Makefile taken from:
|
||||
# https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
|
||||
|
@ -53,36 +64,36 @@ program.
|
|||
|
||||
As software becomes more complex, so too does the task of building it. The
|
||||
limitations of C make this problem particularly egregious, given its fragile
|
||||
dependency resolution and lack of meta-programming. Makefiles are an attempt to
|
||||
bridge this gap, and are a Turing-Complete language in their own right.
|
||||
dependency resolution and lack of meta-programming. Make is an attempt to
|
||||
bridge this gap, and is a Turing-Complete language in its own right.
|
||||
The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
|
||||
is over 2000 lines as of writing. The massive demands placed on this
|
||||
intermediary language have exposed its weak points, mainly that it is
|
||||
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
|
||||
Makefiles contributes to the difficulty of building software almost as much as
|
||||
it reduces
|
||||
it helps
|
||||
|
||||
## cmake
|
||||
Just as Makefiles abstract away the complexity of compiling C, CMake abstracts
|
||||
away the complexity of creating Makefiles. CMake is a great example of what
|
||||
happens to software development when there are no adults in the room, so to
|
||||
speak. Compiling a C program should be a simple task, ideally one that requires
|
||||
nothing more than a C compiler. Failing that, a simple build scripting language
|
||||
should be more than enough to handle even industrial use cases. When our build
|
||||
system needs a build system, we have completely lost the plot and need to
|
||||
reevaluate the problem from square one
|
||||
Just as Make acts as an abstraction over C compilers, CMake acts as an
|
||||
abstraction over Makefiles. CMake is a great example of what happens to software
|
||||
development when there are no adults in the room, so to speak. Compiling a C
|
||||
program should be a simple task, ideally one that requires nothing more than a
|
||||
C compiler. Failing that, a simple build scripting language should be more than
|
||||
enough to handle even industrial use cases. When our build system needs a build
|
||||
system, we have completely lost the plot and need to reevaluate the problem from
|
||||
square one
|
||||
```
|
||||
CMake -> Makefile -> gcc/clang -> Assembly
|
||||
```
|
||||
|
||||
## compile targets
|
||||
Imagine a world where the Makefile language was more expressive, functional, and
|
||||
Imagine a world where the Make language was more expressive, functional, and
|
||||
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
|
||||
another language into the mix would only slow down development and introduce an
|
||||
entirely new category of bugs. CMake can only exist because Makefile failed to
|
||||
entirely new category of bugs. CMake can only exist because Make failed to
|
||||
accomplish its goal. The same could be said for the GCC and Clang compilation
|
||||
syntax. Rather than fix the underlying issue, we treat the failed product as a
|
||||
new compile target and build a new thing to abstract away (never replace!) the
|
||||
new compilation target and build a new thing to abstract away (never replace!) the
|
||||
old thing.
|
||||
|
||||
Developers are not (generally) stupid; this pattern exists for a reason. In the
|
||||
|
@ -90,8 +101,8 @@ case of C, it is sometimes necessary to execute arbitrary code at build-time.
|
|||
The obvious solution is to create a new language to handle this need - but why
|
||||
is the original language not sufficient? Make is written in C, so by definition
|
||||
C can do anything Make can do. The issue is that C source files do not
|
||||
contain enough information for the compiler to build the entire package. This
|
||||
information must be embedded in another, nonstandard format, which itself must
|
||||
contain enough information for the compiler to build the entire program. This
|
||||
information must be embedded in another nonstandard format, which itself must
|
||||
be parsed and executed by a nonstandard build tool
|
||||
|
||||
## shebang
|
||||
|
@ -118,8 +129,8 @@ describes how to run itself to the shell which invokes it. Since the `#`
|
|||
character is used as a comment in Python, the line can be safely ignored by
|
||||
any other programs or tools that read the file. This system is not perfect, the
|
||||
name or path of the python executable may vary between systems, and the shebang
|
||||
relies on a shell to interpret it (technically a build system). Expanding on
|
||||
this concept may help alleviate our build system woes.
|
||||
relies on a shell to interpret it (technically a build system). However,
|
||||
expanding on this concept may help alleviate our build system woes
|
||||
|
||||
## doing better
|
||||
Let's look at how C syntax could be changed to adopt some of these ideals,
|
||||
|
@ -161,7 +172,7 @@ int main(int argc, char* argv[]) {
|
|||
if (strcmp("gcc", compiler.name) != 0
|
||||
|| compiler.semver.major <= 12) {
|
||||
// Abort build with error
|
||||
emit_error("Incompatible compiler version!\n");
|
||||
emit_error("Incompatible compiler version!");
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -189,27 +200,83 @@ The `compile.h` and `link.h` includes are compiler implemented, and so do not
|
|||
need to be linked from the system's libc. All flags passed to the compiler are
|
||||
handed off to the `main()` function. It is easy to imagine an
|
||||
equivalent to `make clean` that erases all build artifacts, or a caching system
|
||||
that only rebuilds modified files.
|
||||
that only rebuilds modified files
|
||||
|
||||
## going further
|
||||
I am not a C developer by any stretch, and so I will spare you any more
|
||||
pseudocode. I hope these examples show that replacing Makefiles with pure
|
||||
C is not such an unreasonable idea. Still, we can go even further; imagine
|
||||
if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc
|
||||
bytecode) conversion functions. This would make meta-programming simple and
|
||||
if we split the `compile()` function into lexing, parsing, and IR generating
|
||||
intermediary functions. This would make meta-programming simple and
|
||||
straightforward, and even allow for the introduction of program-specific syntax.
|
||||
Developers could create libraries for common build tasks such as cloning dependency
|
||||
git repositories, running tests, or submitting binaries to package managers.
|
||||
Developers could create libraries for common build tasks such as cloning git
|
||||
repositories, running tests, or submitting binaries to package managers
|
||||
|
||||
## disadvantages
|
||||
## setbacks
|
||||
Comparing my pseudocode to the Makefile example, it is obvious which is more
|
||||
idiomatic and understandable. This is partially due to my lack of creativity
|
||||
and skill as a C developer. However, I imagine Make will always have an advantage
|
||||
here, at least when it comes to small projects. Even worse, on closer examination
|
||||
we have not yet achieved the goal of in-lining build information. While our build
|
||||
system is now in C, and does not rely on external tools, it is still a separate
|
||||
entity from the project itself. The minimum possible C program is now two files
|
||||
rather than just one. So far I have tried to conform as closely as possible to
|
||||
standard C syntax and grammar, but this approach will always feel more like
|
||||
a hack than a well-thought-out language feature
|
||||
idiomatic and understandable. This is partially due to my lack of creativity and
|
||||
skill as a C developer. However, I imagine Make will always have an advantage
|
||||
here, at least when it comes to small projects. While our build system is
|
||||
now in C, it is still a separate entity from the project itself. The minimum
|
||||
possible C program is now two files rather than just one. So far I have tried
|
||||
to conform as closely as possible to standard C syntax and grammar, but this
|
||||
approach will always feel like a hack more than a well-thought-out language
|
||||
feature
|
||||
|
||||
## the ouroboros
|
||||
Most languages draw a very strong distinction between compile-time and run-time
|
||||
code. Typically, compile-time execution may happen only within macros or
|
||||
constant functions, if it is even allowed at all. This habit can be traced back
|
||||
to assembly programmers who deemed self-modifying code a dangerous antipattern.
|
||||
This mindset is what I believe drives us to create these leaning towers of build
|
||||
systems.
|
||||
|
||||
What would a language built around meta-programming look like? I suspect that
|
||||
a language with a truly infinite degree of self reflection is possible. Such a
|
||||
language could be far more expressive than its peers using less syntax. Imagine
|
||||
if a library could implement a new language-wide keyword, and even implement
|
||||
that keyword using the same keyword. Perhaps concepts as basic as structs,
|
||||
enumerated types, and integers could be defined within the language itself. The
|
||||
line between compilation and execution disappears. The line between language
|
||||
and program grows thin. I imagine this process as if the language were eating
|
||||
itself, like an ouroboros.
|
||||
|
||||
<section>
|
||||
<a href="https://commons.wikimedia.org/wiki/File:Serpiente_alquimica.jpg">
|
||||
<img
|
||||
width="50%"
|
||||
src="https://upload.wikimedia.org/wikipedia/commons/7/71/Serpiente_alquimica.jpg">
|
||||
</a>
|
||||
</section>
|
||||
|
||||
If such a language existed, it follows that every other language would simply
|
||||
be a strict subset of this language (lets call it "every-lang"). For example,
|
||||
we could write an every-lang library which implements every piece of
|
||||
[Lua](https://www.lua.org/) syntax and grammar on a meta-program level. A user
|
||||
that imports this library could then simply write code in Lua, and compile the
|
||||
program using the every-lang compiler. This library would effectively be a Lua
|
||||
build system, that is also an every-lang build system, that is also an "every
|
||||
language" build system
|
||||
|
||||
```lua
|
||||
-- A Lua program? or an every-lang program?
|
||||
require "everylang"
|
||||
|
||||
function fact(n)
|
||||
if n == 0 then
|
||||
return 1
|
||||
else
|
||||
return n * fact(n-1)
|
||||
end
|
||||
end
|
||||
|
||||
print(fact(5))
|
||||
```
|
||||
|
||||
## on crashing and burning
|
||||
This every-lang is, to put it lightly, a little far-fetched. Such a language
|
||||
would be nearly impossible to implement or reason about. A practically useful
|
||||
language must make compromises with its users, and the fundamental laws of
|
||||
computation. I believe that the next frontier for language design will be
|
||||
pushing the boundary on this front - how close can we get to every-lang without
|
||||
crashing and burning?
|
||||
|
|
Loading…
Reference in a new issue