ANTLR is a lexer and parser generator for processing structured text. It takes a grammar file as an input (including regexes for tokens) expressed in extended Backus-Naur form (EBNF). It produces a lexer and parser in a target language (C, C++, Java, C#, etc.).

Internally it uses a LL(*) algorithm for parsing.

Output

ANTLR outputs a lexer and parser class definition/implementation file. It also defines visitors and listeners (i.e., to traverse a parse tree).

To hook into the ANTLR output, we create our own main file, which: takes an input stream, constructs a lexer object and gets tokens from it, then constructs a parser object and gets a parse tree from it. It uses the visitors/listeners to take actions. We also create our own visitors/listeners that inherit from ANTLR’s output classes.

We can also embed our target language code into ANTLR with curly brackets.

Specification

To specify a context-free grammar, we use EBNF. Note that there are a few variations to meet the requirements of the program:

  • All definitions must be terminated by a semi-colon.
  • An empty string (as a possible option with another option A) is expressed like A?.

Resources

  • The Definitive ANTLR4 Reference, by Terence Parr