finished on build systems
This commit is contained in:
parent
9f0ab19962
commit
3af7a3d1f6
|
@ -103,8 +103,8 @@ figcaption {
|
||||||
line-height: 110%;
|
line-height: 110%;
|
||||||
}
|
}
|
||||||
|
|
||||||
img[src],iframe,code {
|
img,iframe,code {
|
||||||
border-radius: 0.25em;
|
border-radius: 0.4em;
|
||||||
}
|
}
|
||||||
|
|
||||||
figure img {
|
figure img {
|
||||||
|
@ -112,6 +112,7 @@ figure img {
|
||||||
width: 100%
|
width: 100%
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
form {
|
form {
|
||||||
border: dashed black;
|
border: dashed black;
|
||||||
}
|
}
|
||||||
|
|
|
@ -1,36 +1,47 @@
|
||||||
# on build systems
|
# on build systems
|
||||||
Recently I have been thinking about what makes for good build system. I
|
Recently I have been thinking about what makes for good build system. I
|
||||||
want to analyze the major pain points I have encountered building software,
|
want to analyze the major pain points I have encountered building software,
|
||||||
and identify where these systems go wrong. To do this I have picked several
|
and identify where these systems go wrong. Seeing as C is one of the most
|
||||||
languages I am already familiar with to use as case studies
|
frustrating languages to build for in my experience, I will use the language
|
||||||
## definitions
|
as a case study
|
||||||
|
## definition
|
||||||
I think of build systems as a very broad category of software; the goal of
|
I think of build systems as a very broad category of software; the goal of
|
||||||
which is to automate the process of packaging or executing other software.
|
which is to automate the process of building other software.
|
||||||
This typically involves several subtasks. Resolving dependencies, compiling,
|
This typically involves several tasks, mainly:
|
||||||
interpreting, linking, deploying, packaging, executing - software that does one
|
* linting
|
||||||
or more of these things counts as a build system in my book
|
* formatting
|
||||||
|
* interpreting
|
||||||
|
* compiling
|
||||||
|
* linking
|
||||||
|
* packaging
|
||||||
|
* deploying
|
||||||
|
* executing
|
||||||
|
|
||||||
|
Software that performs one or more of these tasks is a build tool. Any number of
|
||||||
|
build tools can make up a build system
|
||||||
## c and c++
|
## c and c++
|
||||||
The C family of languages has a quite complicated ecosystem of competing build
|
The C family of languages has a quite complicated ecosystem of competing build
|
||||||
systems. To start with, there are the compilers themselves:
|
systems. To start with, there are the compilers themselves:
|
||||||
[GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical
|
[GCC](https://gcc.gnu.org/) and [Clang](https://clang.llvm.org/). A typical
|
||||||
invocation of looks like this:
|
invocation of either looks like this:
|
||||||
```bash
|
```bash
|
||||||
cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram
|
cc main.c foo.c bar.h -Iinclude/ -Llib/ -O2 -oProgram
|
||||||
```
|
```
|
||||||
This is quite verbose as far as build commands go. The path of every source file
|
This is quite verbose as far as CLI tools go. The path of every source file
|
||||||
must be specified, as well as separate folders for library headers and object
|
must be specified, as well as the location of libraries to be linked. Path
|
||||||
files. Most software also makes use of numerous compiler flags, most of which
|
variables also play a role in the linking process, adding a layer of hidden
|
||||||
have incredibly cryptic names.
|
complexity Most software also makes use of numerous compiler flags, all of
|
||||||
|
which must be typed every time.
|
||||||
|
|
||||||
To compile a C project using only the compiler requires first learning
|
To compile a C project using only the compiler requires first learning
|
||||||
its structure, wrangling each of its dependencies manually, and reading
|
its structure, wrangling each of its dependencies manually, and reading
|
||||||
documentation to find the appropriate build flags for your platform. For any
|
documentation to find the appropriate build flags for your platform. This
|
||||||
non-trivial program, simply typing the build command becomes a challenge
|
is not a reasonable ask for any developer
|
||||||
|
|
||||||
## make
|
## make
|
||||||
The problems introduced by the C family of compilers have proven intractable,
|
Compiler developers have decided that these problems are out of scope, and
|
||||||
and so another layer of abstraction is necessary. Makefile is a rudimentary
|
so another layer of abstraction is necessary. Make is a rudimentary scripting
|
||||||
scripting language primarily used to build C family languages.
|
language primarily used to build C projects.
|
||||||
```make
|
```make
|
||||||
# Example Makefile taken from:
|
# Example Makefile taken from:
|
||||||
# https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
|
# https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
|
||||||
|
@ -53,36 +64,36 @@ program.
|
||||||
|
|
||||||
As software becomes more complex, so too does the task of building it. The
|
As software becomes more complex, so too does the task of building it. The
|
||||||
limitations of C make this problem particularly egregious, given its fragile
|
limitations of C make this problem particularly egregious, given its fragile
|
||||||
dependency resolution and lack of meta-programming. Makefiles are an attempt to
|
dependency resolution and lack of meta-programming. Make is an attempt to
|
||||||
bridge this gap, and are a Turing-Complete language in their own right.
|
bridge this gap, and is a Turing-Complete language in its own right.
|
||||||
The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
|
The Makefile which [builds the Linux kernel](https://github.com/torvalds/linux/blob/master/Makefile)
|
||||||
is over 2000 lines as of writing. The massive demands placed on this
|
is over 2000 lines as of writing. The massive demands placed on this
|
||||||
intermediary language have exposed its weak points, mainly that it is
|
intermediary language have exposed its weak points, mainly that it is
|
||||||
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
|
stringly-typed and full of cryptic, unintuitive syntax. Maintaining complex
|
||||||
Makefiles contributes to the difficulty of building software almost as much as
|
Makefiles contributes to the difficulty of building software almost as much as
|
||||||
it reduces
|
it helps
|
||||||
|
|
||||||
## cmake
|
## cmake
|
||||||
Just as Makefiles abstract away the complexity of compiling C, CMake abstracts
|
Just as Make acts as an abstraction over C compilers, CMake acts as an
|
||||||
away the complexity of creating Makefiles. CMake is a great example of what
|
abstraction over Makefiles. CMake is a great example of what happens to software
|
||||||
happens to software development when there are no adults in the room, so to
|
development when there are no adults in the room, so to speak. Compiling a C
|
||||||
speak. Compiling a C program should be a simple task, ideally one that requires
|
program should be a simple task, ideally one that requires nothing more than a
|
||||||
nothing more than a C compiler. Failing that, a simple build scripting language
|
C compiler. Failing that, a simple build scripting language should be more than
|
||||||
should be more than enough to handle even industrial use cases. When our build
|
enough to handle even industrial use cases. When our build system needs a build
|
||||||
system needs a build system, we have completely lost the plot and need to
|
system, we have completely lost the plot and need to reevaluate the problem from
|
||||||
reevaluate the problem from square one
|
square one
|
||||||
```
|
```
|
||||||
CMake -> Makefile -> gcc/clang -> Assembly
|
CMake -> Makefile -> gcc/clang -> Assembly
|
||||||
```
|
```
|
||||||
|
|
||||||
## compile targets
|
## compile targets
|
||||||
Imagine a world where the Makefile language was more expressive, functional, and
|
Imagine a world where the Make language was more expressive, functional, and
|
||||||
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
|
well-thought-out. Suddenly the idea of CMake becomes silly; clearly introducing
|
||||||
another language into the mix would only slow down development and introduce an
|
another language into the mix would only slow down development and introduce an
|
||||||
entirely new category of bugs. CMake can only exist because Makefile failed to
|
entirely new category of bugs. CMake can only exist because Make failed to
|
||||||
accomplish its goal. The same could be said for the GCC and Clang compilation
|
accomplish its goal. The same could be said for the GCC and Clang compilation
|
||||||
syntax. Rather than fix the underlying issue, we treat the failed product as a
|
syntax. Rather than fix the underlying issue, we treat the failed product as a
|
||||||
new compile target and build a new thing to abstract away (never replace!) the
|
new compilation target and build a new thing to abstract away (never replace!) the
|
||||||
old thing.
|
old thing.
|
||||||
|
|
||||||
Developers are not (generally) stupid; this pattern exists for a reason. In the
|
Developers are not (generally) stupid; this pattern exists for a reason. In the
|
||||||
|
@ -90,8 +101,8 @@ case of C, it is sometimes necessary to execute arbitrary code at build-time.
|
||||||
The obvious solution is to create a new language to handle this need - but why
|
The obvious solution is to create a new language to handle this need - but why
|
||||||
is the original language not sufficient? Make is written in C, so by definition
|
is the original language not sufficient? Make is written in C, so by definition
|
||||||
C can do anything Make can do. The issue is that C source files do not
|
C can do anything Make can do. The issue is that C source files do not
|
||||||
contain enough information for the compiler to build the entire package. This
|
contain enough information for the compiler to build the entire program. This
|
||||||
information must be embedded in another, nonstandard format, which itself must
|
information must be embedded in another nonstandard format, which itself must
|
||||||
be parsed and executed by a nonstandard build tool
|
be parsed and executed by a nonstandard build tool
|
||||||
|
|
||||||
## shebang
|
## shebang
|
||||||
|
@ -118,8 +129,8 @@ describes how to run itself to the shell which invokes it. Since the `#`
|
||||||
character is used as a comment in Python, the line can be safely ignored by
|
character is used as a comment in Python, the line can be safely ignored by
|
||||||
any other programs or tools that read the file. This system is not perfect, the
|
any other programs or tools that read the file. This system is not perfect, the
|
||||||
name or path of the python executable may vary between systems, and the shebang
|
name or path of the python executable may vary between systems, and the shebang
|
||||||
relies on a shell to interpret it (technically a build system). Expanding on
|
relies on a shell to interpret it (technically a build system). However,
|
||||||
this concept may help alleviate our build system woes.
|
expanding on this concept may help alleviate our build system woes
|
||||||
|
|
||||||
## doing better
|
## doing better
|
||||||
Let's look at how C syntax could be changed to adopt some of these ideals,
|
Let's look at how C syntax could be changed to adopt some of these ideals,
|
||||||
|
@ -161,7 +172,7 @@ int main(int argc, char* argv[]) {
|
||||||
if (strcmp("gcc", compiler.name) != 0
|
if (strcmp("gcc", compiler.name) != 0
|
||||||
|| compiler.semver.major <= 12) {
|
|| compiler.semver.major <= 12) {
|
||||||
// Abort build with error
|
// Abort build with error
|
||||||
emit_error("Incompatible compiler version!\n");
|
emit_error("Incompatible compiler version!");
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -189,27 +200,83 @@ The `compile.h` and `link.h` includes are compiler implemented, and so do not
|
||||||
need to be linked from the system's libc. All flags passed to the compiler are
|
need to be linked from the system's libc. All flags passed to the compiler are
|
||||||
handed off to the `main()` function. It is easy to imagine an
|
handed off to the `main()` function. It is easy to imagine an
|
||||||
equivalent to `make clean` that erases all build artifacts, or a caching system
|
equivalent to `make clean` that erases all build artifacts, or a caching system
|
||||||
that only rebuilds modified files.
|
that only rebuilds modified files
|
||||||
|
|
||||||
## going further
|
## going further
|
||||||
I am not a C developer by any stretch, and so I will spare you any more
|
I am not a C developer by any stretch, and so I will spare you any more
|
||||||
pseudocode. I hope these examples show that replacing Makefiles with pure
|
pseudocode. I hope these examples show that replacing Makefiles with pure
|
||||||
C is not such an unreasonable idea. Still, we can go even further; imagine
|
C is not such an unreasonable idea. Still, we can go even further; imagine
|
||||||
if we split the `compile()` function into lexing, parsing, and IR (llvm/gcc
|
if we split the `compile()` function into lexing, parsing, and IR generating
|
||||||
bytecode) conversion functions. This would make meta-programming simple and
|
intermediary functions. This would make meta-programming simple and
|
||||||
straightforward, and even allow for the introduction of program-specific syntax.
|
straightforward, and even allow for the introduction of program-specific syntax.
|
||||||
Developers could create libraries for common build tasks such as cloning dependency
|
Developers could create libraries for common build tasks such as cloning git
|
||||||
git repositories, running tests, or submitting binaries to package managers.
|
repositories, running tests, or submitting binaries to package managers
|
||||||
|
|
||||||
## disadvantages
|
## setbacks
|
||||||
Comparing my pseudocode to the Makefile example, it is obvious which is more
|
Comparing my pseudocode to the Makefile example, it is obvious which is more
|
||||||
idiomatic and understandable. This is partially due to my lack of creativity
|
idiomatic and understandable. This is partially due to my lack of creativity and
|
||||||
and skill as a C developer. However, I imagine Make will always have an advantage
|
skill as a C developer. However, I imagine Make will always have an advantage
|
||||||
here, at least when it comes to small projects. Even worse, on closer examination
|
here, at least when it comes to small projects. While our build system is
|
||||||
we have not yet achieved the goal of in-lining build information. While our build
|
now in C, it is still a separate entity from the project itself. The minimum
|
||||||
system is now in C, and does not rely on external tools, it is still a separate
|
possible C program is now two files rather than just one. So far I have tried
|
||||||
entity from the project itself. The minimum possible C program is now two files
|
to conform as closely as possible to standard C syntax and grammar, but this
|
||||||
rather than just one. So far I have tried to conform as closely as possible to
|
approach will always feel like a hack more than a well-thought-out language
|
||||||
standard C syntax and grammar, but this approach will always feel more like
|
feature
|
||||||
a hack than a well-thought-out language feature
|
|
||||||
|
|
||||||
|
## the ouroboros
|
||||||
|
Most languages draw a very strong distinction between compile-time and run-time
|
||||||
|
code. Typically, compile-time execution may happen only within macros or
|
||||||
|
constant functions, if it is even allowed at all. This habit can be traced back
|
||||||
|
to assembly programmers who deemed self-modifying code a dangerous antipattern.
|
||||||
|
This mindset is what I believe drives us to create these leaning towers of build
|
||||||
|
systems.
|
||||||
|
|
||||||
|
What would a language built around meta-programming look like? I suspect that
|
||||||
|
a language with a truly infinite degree of self reflection is possible. Such a
|
||||||
|
language could be far more expressive than its peers using less syntax. Imagine
|
||||||
|
if a library could implement a new language-wide keyword, and even implement
|
||||||
|
that keyword using the same keyword. Perhaps concepts as basic as structs,
|
||||||
|
enumerated types, and integers could be defined within the language itself. The
|
||||||
|
line between compilation and execution disappears. The line between language
|
||||||
|
and program grows thin. I imagine this process as if the language were eating
|
||||||
|
itself, like an ouroboros.
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<a href="https://commons.wikimedia.org/wiki/File:Serpiente_alquimica.jpg">
|
||||||
|
<img
|
||||||
|
width="50%"
|
||||||
|
src="https://upload.wikimedia.org/wikipedia/commons/7/71/Serpiente_alquimica.jpg">
|
||||||
|
</a>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
If such a language existed, it follows that every other language would simply
|
||||||
|
be a strict subset of this language (lets call it "every-lang"). For example,
|
||||||
|
we could write an every-lang library which implements every piece of
|
||||||
|
[Lua](https://www.lua.org/) syntax and grammar on a meta-program level. A user
|
||||||
|
that imports this library could then simply write code in Lua, and compile the
|
||||||
|
program using the every-lang compiler. This library would effectively be a Lua
|
||||||
|
build system, that is also an every-lang build system, that is also an "every
|
||||||
|
language" build system
|
||||||
|
|
||||||
|
```lua
|
||||||
|
-- A Lua program? or an every-lang program?
|
||||||
|
require "everylang"
|
||||||
|
|
||||||
|
function fact(n)
|
||||||
|
if n == 0 then
|
||||||
|
return 1
|
||||||
|
else
|
||||||
|
return n * fact(n-1)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
print(fact(5))
|
||||||
|
```
|
||||||
|
|
||||||
|
## on crashing and burning
|
||||||
|
This every-lang is, to put it lightly, a little far-fetched. Such a language
|
||||||
|
would be nearly impossible to implement or reason about. A practically useful
|
||||||
|
language must make compromises with its users, and the fundamental laws of
|
||||||
|
computation. I believe that the next frontier for language design will be
|
||||||
|
pushing the boundary on this front - how close can we get to every-lang without
|
||||||
|
crashing and burning?
|
||||||
|
|
Loading…
Reference in a new issue