Part of my project involves running custom script (pre-existing game scripts). I've previously implemented that in Java. There created my own parser that compiles it into an object tree that can then be executed (as such it's not interpreted anymore). The whole parser is just over 1000 lines of code which doesn't seem too bad. Now, a few years later, I want to run it in the browser and need a JavaScript solution. Perhaps I could have used the same approach again but I was seeking for a better solution. It was clear that I didn't want to write a bespoke interpreter as would have been more difficult to test, is slower and less portable. The logical solution seem to be (and you may disagree) to compile it to JavaScript.

Some of the main challenges:

  • Select a parser generator
  • Generate non-blocking JavaScript
  • Support labels

Selecting a Parser Generator

For the first question there is already a lot of help out there (e.g. on stackoverflow). The main contender seem to be ANTLR, Jison and PEG.js.

ANTLR seems great (and the website changed to a much cleaner design recently). However, I preferred a JavaScript only solution (although not strictly necessary). I came across a blog (which I couldn't find anymore) that mentioned both Jison and PEG.js but suggested Jison with some examples. Also as Jison is based on Bison (which has a longer history) it seemed to be a good choice to start with. The grammar was based on the JavaScript grammar which I then attempted to modify. This is where I had issues with Jison. I just couldn't figure out why it didn't work as intended. After a while I gave up and tried my luck with PEG.js. Fortunately that actually worked (I had to get the latest version from the source control). The comparison may not be fair but might be valid as an indication that PEG.js might be easier to integrate.

The source code of the compiler (using the generated parser) is now just over 500 lines long and most of it isn't very complicated (mostly recursive output generation based on the AST). The original JavaScript grammar needed only a few adjustments.

Generate non-blocking JavaScript

The next challenge was to generate non-blocking JavaScript code. Imagine the following pseudo code:

value = readInput();
display(value);

In this readInput may wait for the user to input something. The value would then be displayed using the following line. JavaScript (at least until now) is event driven and generally single threaded (apart from web workers).  Therefore we need to convert it into event-driven JavaScript. There are different approaches but using the promise pattern / JSDeferred (jQuery and other libraries implement something very similar) seems to be the best option:

readInput().next(function(value) {
  display(value);
});

Let's have a quick look at how multiple deferred calls would look like:

value1 = readInput();
display(value1);
value2 = readInput();
display(value2);

This could either result in nested calls:

readInput().next(function(value1) {
  display(value1);
  readInput().next(function(value1) {
    display(value1);
  });
});

Or chained calls:

readInput().next(function(value1) {
  display(value1);
  return readInput();
).next(function(value2) {
  display(value2);
});

The compiler then just needs to know what functions need a deferred call and what functions may not (although you could decide to make all functions deferred of course).

Supporting Labels

So far everything builds on top of what other people have already done and my browser was perhaps more active than my brain. But how can I add support for labels? And not only that, the source language supports labels between method calls. In fact there are some suggestions in the internet, mainly using either a switch statement or a named break clause. But neither would work across method calls. Eventually I settled on using deferred which seem to be doing the job. For example the following source file:

void f1() {
label1:
  display(value1);
}

void f2() {
  value1 = 123;
  goto label1;
}

May result into:

label1 = new Deferred().next(function() {
  display(value1);
});

f1 = function() {
  return label1;
};

f2 = function() {
  value1 = 123;
  return label1;
};

I've also added an implicit label for the function entry point and return statement (although only if the script makes use of labels).

Conclusion

After following the above scripts I created myself a compiler for the source language (with some additional tweaks). Due to the way the variable scope needs to be managed variables are not simple JavaScript variables but that's secondary.

To polish off the sourceURL to the original script can be added to the end of the script (it didn't quite work at the top of the script). To be able to debug the source script source maps would be a nice addition.