Compiling to WebAssembly with Binaryen
Compile your own programming language to WebAssembly binary instruction format (Wasm) using Binaryen’s JavaScript API.
If you (like me) don’t excel in any system language such as C/C++ or Rust you might never have thought about creating a compiler. You may have created an interpreter or a transpiler though.
For playing around, an interpreter is usually just fine. Moreover, a transpiler can produce a language that could be further compiled into executable binaries.
Both approaches have pros and cons, interpreters tend to be slow in execution and transpilers are usually suboptimal as they translate code into languages that were not meant to be a compilation target.
WebAssembly, on the other hand, is designed as a portable compilation target for programming languages. Nowadays, one can see Wasm compilers for almost every existing language and there is a good reason for that: it’s super easy!
Binaryen is, like LLVM, a compiler and toolchain infrastructure library. Binaryen is targeting WebAssembly and offers a simple C and JavaScript API. We will focus on the latter.
Brainfuck
Because designing a programming language is not the point of this text (compiling is), we will simply use brainfuck as an example.
Brainfuck is a popular Turing-complete esoteric language that consists of only eight single-character commands:
>
Move the pointer to the right.<
Move the pointer to the left.+
Increment the memory cell at the pointer.-
Decrement the memory cell at the pointer..
Output the character signified by the cell at the pointer.,
Input a character and store it in the cell at the pointer.[
Jump past the matching ] if the cell at the pointer is 0.]
Jump back to the matching [ if the cell at the pointer is nonzero.
Brainfuck operates on an array of memory cells, each initially set to zero. Values are interpreted as ASCII characters.
Typical brainfuck code looks like follows:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.
This program prints Hello
.
To play around with brainfuck you can use an online interpreter.
Getting Started
Let’s create a new Node.js project and add Binaryen as a dependency:
cd brainfuck2wasm/ npm init -y npm i binaryen
We need an entry point index.js
to load the source file and trigger the compilation:
export function main(args) { const sourceFile = args[0]; const input = fs.readFileSync(sourceFile, 'utf-8'); const binary = compile(input); fs.writeFileSync(`${sourceFile}.wasm`, binary); }
We will create a two-phase compiler: 1. paring the source code and potential optimizations, 2. compiling into Wasm using Binaryen.
The compile method from index.js looks like follows:
function compile(input) { const parser = new Parser(); const ast = parser.parse(input); const compiler = new Compiler(); return compiler.compile(ast); }
Parsing Source Code
We need a map of brainfuck commands and their ASCII representation:
cmdMap.set(43, '+'); cmdMap.set(45, '-'); cmdMap.set(62, '>'); cmdMap.set(60, '<'); cmdMap.set(46, '.'); cmdMap.set(44, ','); cmdMap.set(91, '['); cmdMap.set(93, ']');
Then, we can parse the source code command by command until EOF:
let index = 0; while (!eof()) { const code = source[index++].charCodeAt(); const kind = cmdMap.get(code); const cmd = { kind, children: [] }; ... }
When we encounter a beginning of a loop, we push a new branch to the stack:
if (cmd.kind === '[') { // push to the current branch branch.children.push(cmd); // push a new branch for loop ast.push(cmd); }
Each loop end must be paired with a beginning. We throw an error otherwise:
else if (cmd.kind === ']') { if (ast.pop().kind !== '[') { throw new SyntaxError('unmatched ]'); } }
Other commands are just pushed into the current branch:
else { branch.children.push(cmd); }
Further, we can perform some optimization of the source code like grouping commands of the same kind. For example, brainfuck code +++
will result in a single +
command with an amount of 3 rather than in three +
commands, and similar.
Compiling with Binaryen
Using Binaryen is not very different from programming in WebAssembly text format (Wat). You might want to get familiar with the basics of Wat programming before reading the next section.
Wasm Module
The root unit of Wasm is a module:
const module = new binaryen.Module();
We will use 1 page (64 KiB) of memory:
module.setMemory(1, 1);
And a global variable p
to represent the pointer to current memory cell:
Importing Functions
WebAssembly modules are sandboxed per default, we have to declare functions for input and output to be imported into the module via environment:
const t_void = binaryen.none; const t_i32 = binaryen.i32; module.addFunctionImport('input', 'main', 'input', t_void, t_i32); module.addFunctionImport('output', 'main', 'output', t_i32, t_void);
Exporting The Main Function
Now, we can recursively compile the AST root branch and close the resulting command into a code block afterward:
const body = compileBranch(ast); const block = module.block('exec', body);
The block forms the module’s main function to export:
module.addFunction('main', binaryen.none, binaryen.none, [], block); module.addFunctionExport('main', 'main');
Compiling Commands
Compiling a branch means to interpret each command in Binaryen:
compileBranch(branch) { const body = []; branch.children.forEach(cmd => { let c; switch (cmd.kind) { case '+': c = increment(); body.push(c); break; case '-': c = decrement(); body.push(c); break; case '>': c = moveRight(); body.push(c); break; case '<': c = moveLeft(); body.push(c); break; case '.': c = output(); body.push(c); break; case ',': c = input(); body.push(c); break; case '[': c = loop(cmd); body.push(c); break; } }); return body; }
Every command must be compiled into a set of Wasm instructions.
For instance, the increment command + is performed in the following steps:
- get the pointer’s value,
- load its byte value from the memory,
- add 1,
- store the result to the memory.
increment() { const p = module.global.get('p', binaryen.i32); const val = module.i32.load8_u(0, 1, p); const one = module.i32.const(1); const add = module.i32.add(val, one); const sto = module.i32.store8(0, 1, p, add); return sto; }
Analogous for other commands.
Compiling Loops
To compile a loop we must compile the closed commands first. We will put them into a block inside the loop together with a conditional break.
The Wasm break will repeat the loop if the condition is met. In our case, if the memory byte under the pointer’s current value is zero:
loop(branch) { // 1. compile commands inside the loop const commands = compileBranch(branch); const label = getLoopUid(); // 2. break if the current value is zero const p = module.global.get('p', binaryen.i32); const val = module.i32.load8_u(0, 1, p); const br = module.br_if(label, val); // 3. wrap the commands and break into a block const block = module.block(null, [...commands, br]); const loop = module.loop(label, block); return loop; }
Emitting Wasm
Finally, we can validate, optimize, and emit a compiled Wasm module:
if (!module.validate()) { throw new Error('Module invalid.'); } module.optimize(); const binary = module.emitBinary(); return binary;
Running Wasm
Now, when our Wasm binary is compiled, we can run it in Node.js. All we need is to provide the input and output functions in the import object:
WebAssembly .instantiate(fs.readFileSync(wasm), { main: { input: () => readChar(), output: c => writeChar(String.fromCharCode(c)) } }) .then(({ instance }) => instance.exports.main());
Source Code
That’s all, folks. Now, there is nothing to prevent you from creating your own programming language and compiling it to WebAssembly.
You can find the full compiler source code on my GitHub.
Happy compiling!