Compiling to WebAssembly with Binaryen

Compile your own programming language to WebAssembly binary instruction format (Wasm) using Binaryen’s JavaScript API.


If you (like me) don’t excel in any system language such as C/C++ or Rust you might never have thought about creating a compiler. You may have created an interpreter or a transpiler though.

For playing around, an interpreter is usually just fine. Moreover, a transpiler can produce a language that could be further compiled into executable binaries.

Both approaches have pros and cons, interpreters tend to be slow in execution and transpilers are usually suboptimal as they translate code into languages that were not meant to be a compilation target.

WebAssembly, on the other hand, is designed as a portable compilation target for programming languages. Nowadays, one can see Wasm compilers for almost every existing language and there is a good reason for that: it’s super easy!

Binaryen is, like LLVM, a compiler and toolchain infrastructure library. Binaryen is targeting WebAssembly and offers a simple C and JavaScript API. We will focus on the latter.

Brainfuck

Because designing a programming language is not the point of this text (compiling is), we will simply use brainfuck as an example.

Brainfuck is a popular Turing-complete esoteric language that consists of only eight single-character commands:

Brainfuck operates on an array of memory cells, each initially set to zero. Values are interpreted as ASCII characters.

Typical brainfuck code looks like follows:

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.

This program prints Hello.

To play around with brainfuck you can use an online interpreter.

Getting Started

Let’s create a new Node.js project and add Binaryen as a dependency:

cd brainfuck2wasm/
npm init -y
npm i binaryen

We need an entry point index.js to load the source file and trigger the compilation:

export function main(args) {
  const sourceFile = args[0];
  const input = fs.readFileSync(sourceFile, 'utf-8');
  
  const binary = compile(input);

  fs.writeFileSync(`${sourceFile}.wasm`, binary);
}

We will create a two-phase compiler: 1. paring the source code and potential optimizations, 2. compiling into Wasm using Binaryen.

The compile method from index.js looks like follows:

function compile(input) {
  const parser = new Parser();
  const ast = parser.parse(input);

  const compiler = new Compiler();
  return compiler.compile(ast);
}

Parsing Source Code

We need a map of brainfuck commands and their ASCII representation:

cmdMap.set(43, '+');
cmdMap.set(45, '-');
cmdMap.set(62, '>');
cmdMap.set(60, '<');
cmdMap.set(46, '.');
cmdMap.set(44, ',');
cmdMap.set(91, '[');
cmdMap.set(93, ']');

Then, we can parse the source code command by command until EOF:

let index = 0;
while (!eof()) {
 const code = source[index++].charCodeAt();
 const kind = cmdMap.get(code);
 const cmd = { kind, children: [] };
 ...
}

When we encounter a beginning of a loop, we push a new branch to the stack:

if (cmd.kind === '[') {
  // push to the current branch
  branch.children.push(cmd);

  // push a new branch for loop
  ast.push(cmd);
}

Each loop end must be paired with a beginning. We throw an error otherwise:

else if (cmd.kind === ']') {
  if (ast.pop().kind !== '[') {
    throw new SyntaxError('unmatched ]');
  }
}

Other commands are just pushed into the current branch:

else {
  branch.children.push(cmd);
}

Further, we can perform some optimization of the source code like grouping commands of the same kind. For example, brainfuck code +++ will result in a single + command with an amount of 3 rather than in three + commands, and similar.

Compiling with Binaryen

Using Binaryen is not very different from programming in WebAssembly text format (Wat). You might want to get familiar with the basics of Wat programming before reading the next section.

Wasm Module

The root unit of Wasm is a module:

const module = new binaryen.Module();

We will use 1 page (64 KiB) of memory:

module.setMemory(1, 1);

And a global variable p to represent the pointer to current memory cell:

Importing Functions

WebAssembly modules are sandboxed per default, we have to declare functions for input and output to be imported into the module via environment:

const t_void = binaryen.none;
const t_i32 = binaryen.i32;

module.addFunctionImport('input', 'main', 'input', t_void, t_i32);
module.addFunctionImport('output', 'main', 'output', t_i32, t_void);

Exporting The Main Function

Now, we can recursively compile the AST root branch and close the resulting command into a code block afterward:

const body = compileBranch(ast);

const block = module.block('exec', body);

The block forms the module’s main function to export:

module.addFunction('main', binaryen.none, binaryen.none, [], block);

module.addFunctionExport('main', 'main');

Compiling Commands

Compiling a branch means to interpret each command in Binaryen:

compileBranch(branch) {
  const body = [];
  branch.children.forEach(cmd => {
    let c;
    switch (cmd.kind) {
      case '+':
        c = increment();
        body.push(c);
        break;
      case '-':
        c = decrement();
        body.push(c);
        break;
      case '>':
        c = moveRight();
        body.push(c);
        break;
      case '<':
        c = moveLeft();
        body.push(c);
        break;
      case '.':
        c = output();
        body.push(c);
        break;
      case ',':
        c = input();
        body.push(c);
        break;
      case '[':
        c = loop(cmd);
        body.push(c);
        break;
      }
  });
  return body;
}

Every command must be compiled into a set of Wasm instructions.

For instance, the increment command + is performed in the following steps:

  1. get the pointer’s value,
  2. load its byte value from the memory,
  3. add 1,
  4. store the result to the memory.
increment() {
  const p = module.global.get('p', binaryen.i32);
  const val = module.i32.load8_u(0, 1, p);
  const one = module.i32.const(1);
  const add = module.i32.add(val, one);
  const sto = module.i32.store8(0, 1, p, add);
  return sto;
}

Analogous for other commands.

Compiling Loops

To compile a loop we must compile the closed commands first. We will put them into a block inside the loop together with a conditional break.

The Wasm break will repeat the loop if the condition is met. In our case, if the memory byte under the pointer’s current value is zero:

loop(branch) {
  // 1. compile commands inside the loop
  const commands = compileBranch(branch);

  const label = getLoopUid();

  // 2. break if the current value is zero
  const p = module.global.get('p', binaryen.i32);
  const val = module.i32.load8_u(0, 1, p);
  const br = module.br_if(label, val);

  // 3. wrap the commands and break into a block
  const block = module.block(null, [...commands, br]);
  const loop = module.loop(label, block);
  return loop;
}

Emitting Wasm

Finally, we can validate, optimize, and emit a compiled Wasm module:

if (!module.validate()) {
  throw new Error('Module invalid.');
}

module.optimize();

const binary = module.emitBinary();
return binary;

Running Wasm

Now, when our Wasm binary is compiled, we can run it in Node.js. All we need is to provide the input and output functions in the import object:

WebAssembly
.instantiate(fs.readFileSync(wasm), {
  main: {
    input: () => readChar(),
    output: c => writeChar(String.fromCharCode(c))
  }
})
.then(({ instance }) => instance.exports.main());

Source Code

That’s all, folks. Now, there is nothing to prevent you from creating your own programming language and compiling it to WebAssembly.

You can find the full compiler source code on my GitHub.

Happy compiling!