In computer science, a parser is a program that converts the tokens produced by a lexer into a parse tree. There are two main parsing approaches: top-down, and bottom-up. All leaf nodes are tokens, and interior nodes are grammatical rules.
The core premise of parsing: given an input sentence, does there exist a derivation of this sentence from the start symbol?
Parse trees represent the tokens and how they relate to one another, according to the grammatical rules of the high-level language. It produces syntax errors when the tokens aren’t related properly according to the rules, and may attempt to “recover” from them.
Parsing approaches
Parsing approaches are expressed with three key characters. For LL(k), we have:
- L: left-to-right scan
- L: leftmost derivation
- k: tokens lookahead
Top-down parsers start with the start symbol . It determines which production to use to get the leftmost derivation of an input, rewrites using the productions, and repeats until the derivation of the input is complete or an error. It builds the tree top down. In this case, the grammar cannot be left-recursive.
Bottom-up parsers start with the input sentence. It determines which production to use to get the rightmost derivation, and it repeats until the start symbol is obtained. It builds the tree bottom-up.
Parsers can rely on parsing tables and a stack.
Bottom-up parsing
We start with the sentence, and examine the inputs to determine the production to re-write some terminals into a non-terminal. Repeat until left only with the start symbol.
Having more than one production used to reduce is possible, called a reduce-reduce conflict. It’s also possible to have a shift and reduce possible, where each leads to an accept or a reject, called a shift-reduce conflict.
Important functions
State closure
Given a set of items and a symbol , the closure of is defined by: