HTML Templating Engine View source on GitHub

Front End - Parser Design

Motivation

HTML is a powerful tool for creating static web pages, but is not easily modular or scalable. Front end frameworks like React provide reusable components. I want to use components without the overhead of a front-end framework. I want the final output to be statically generated rather than generated on the fly by the server or the client, minified and with inlined CSS and JavaScript.

Approach

After looking at some of the HTML parsing libraries, I was unsatisfied with the approaches most developers took. The DOM is represented as a tree structure, where each node keeps a pointer to its children.

// From the html_parser crate
pub enum Node {
    Text(String),
    Element(Element),
    Comment(String),
}

pub struct Element {
    pub name: String,
    pub attributes: HashMap<String, Option<String>>,
    pub children: Vec<Node>,
    // other fields omitted
}
This is bad for cache locality and effectively forces the use of recursion in order to traverse the DOM. Recursion is undesirable for performance and robustness reasons. My approach is to create a flat array of elements. Parsing is done using a non-recursive parser combinator model.

// My implementation (fields omitted)
pub enum HtmlElement {
  DocType,
  Comment(/* */),
  OpenTag { /* */ },
  CloseTag { /* */ },
  Text( /* */ ),
  Script { /* */ },
  Style { /* */ },
  Directive { /* */ },
}
This approach lends itself to linear iterative algorithms. An element's children can be represented as a slice of the DOM array, from the opening tag to its close tag.

Templates

I define a "template" as a user defined element which expands into a larger piece of HTML. Templates can pass on children and attributes to the final output. Some special templates can perform other functions, like inlining a CSS file or syntax highlighting a code block.

<!-- Here is a template definition -->
<Echo> 
  <h1 class=@class1> 
    <@children/>
  </h1>
  <h2 class=@class2> 
    <@children/> 
  </h2>
  <h3 class=@class3>  
    <@children/> 
  </h3>
</Echo>

<!-- And this is how to use the template -->
<Echo class1="some_class1" class2="some_class2" class3="some_class3">
  Echo!
</Echo>

<!-- Which expands to this -->
<h1 class="some_class1"> 
  Echo!
</h1>
<h2 class="some_class2"> 
  Echo!
</h2>
<h3 class="some_class3">  
  Echo!
</h3>
The example above shows how templates can inherit attributes and child elements from their invocation. Special templates begin with the '@' symbol, and perform some sort of meta-function. In addition to <@children/>, <@code/> performs syntax highlighting, and <@style/> inlines a CSS file.

Dependencies

The templating engine only relies on two crates for performing syntax highlighting of code blocks. It uses inkjet, a thin wrapper over treesitter. Treesitter is an efficient error-tolerant language parser commonly used in IDEs. While it is fast, it is by far the slowest and most cumbersome part of the program. Good looking formatted code is important for my website, and implementing parsers for every language I use is out of scope, so this is a necessary evil.

// Snippet from Cargo.toml
[dependencies]
inkjet = "0.10"
v_htmlescape = "0.15"