ANTLR is a lexer and parser generator for processing structured text. It takes a grammar file as an input (including regexes for tokens) expressed in extended Backus-Naur form (EBNF). It produces a lexer and parser in a target language (C, C++, Java, C#, etc.).
Internally it uses a LL(*) algorithm for parsing.
Output
ANTLR outputs a lexer and parser class definition/implementation file. It also defines visitors and listeners (i.e., to traverse a parse tree).
To hook into the ANTLR output, we create our own main
file, which: takes an input stream, constructs a lexer object and gets tokens from it, then constructs a parser object and gets a parse tree from it. It uses the visitors/listeners to take actions. We also create our own visitors/listeners that inherit from ANTLR’s output classes.
We can also embed our target language code into ANTLR with curly brackets.
Specification
To specify a context-free grammar, we use EBNF. Note that there are a few variations to meet the requirements of the program:
- All definitions must be terminated by a semi-colon.
- An empty string (as a possible option with another option
A
) is expressed likeA?
.
Resources
- The Definitive ANTLR4 Reference, by Terence Parr