This is an automated archive made by the Lemmit Bot.

The original was posted on /r/programminglanguages by /u/tsikhe on 2023-12-15 04:44:02.


Something has been bothering me ever since I tried to make my first interpreter. Imagine you are trying to make an interpreted language in a type-safe host language. Let’s say, ANTLR and Java, to keep it simple.

So you design an AST, symbol tables, types, values, an eval visitor, some tests… super easy stuff. The language works. So far so good.

However as you add more features, the AST undergoes more and more scrutiny from various “phases.” You add the “convert dotted namespace paths to canonical symbol lookups” and therefore eliminate all dot operator AST nodes that references namespaces (and therefore have no type). When you run the tests… they fail. You invoked the new AST visitor, but accidentally passed the old AST into the eval visitor instead of the new (modified) one. This could have been caught by the compiler of the host language if the two phases of AST had different types in the host language.

My own trivial solution to this problem is as follows:

  1. PostParseAST
  2. DottedNamespacePathsRemovedAST
  3. TypesCheckedAST
  4. ExplicitTypeInstantiationsAST
  5. PrepForEvalAST

You could use a code generator to output boilerplate for these AST formats, including visitor interfaces and converter visitors.

So my question: how would you add host language type safety to your AST?