Context-free grammar

A grammar is context-free (CFG) if the rules of the grammar do not depend on the context in which the tokens appear. Context-free grammars consist of:

A set of terminal symbols $Σ$ .
- i.e., symbols that appear in the input stream (tokens).
- By convention, these are in lower-case letters, operators, digits, brackets.
A set of non-terminal symbols $N$ .
- i.e., a set of “intermediate” symbols that help us express the grammar. We substitute these out for tokens that eventually become terminals.
- By convention, these start with upper-case letters.
A start symbol $S \in N$ . It’s denoted $S$ .
- i.e., a special non-terminal symbol that refers to the whole language.
A set of productions $P$ .
- These express what the language is. We can re-write non-terminals in terms of non-terminals and terminals.
We also define $$$ as the end-of-input token.

We also define a derivation of a sentence as a possible conversion from a sentence to a production rule (also called a sentential form) used to describe it.

The set of CFGs and the grammars they accept are provably a decidable language. In addition, any context-free language is a member of the complexity class $P$ .

Productions

Productions can be described as followed. The start production is often the first one.

P : S \to A

There could be a recursive definition in our grammar. The below is a sentential form of $A$ .

A = a A b

The arrow means one can re-write or express the LHS of the arrow as the RHS of the arrow. Only non-terminals appear on the LHS (and only one for CFGs). Note that we can use the vertical pipe $∣$ to express alternatives.

A \to ϵ

or alternatively:

A \to a A b ∣ ϵ

These lines and this convention allows us to specify a grammar only with its productions.

Derivation

Given a more complicated grammar, we have two key questions: which production do we use? Which non-terminal in a production do we re-write?

The leftmost (rightmost) derivation is where we consistently re-write the leftmost (rightmost) non-terminal in a sentential form. We try to be consistent with being leftmost or rightmost. Ideally our grammar must be defined in a way that it’s possible to use a single consistent approach. We also define some important sets. Given a sentential form $s$ of grammar symbols (terminals and non-terminals), the first set $F i rs t (s)$ is the set of terminals that can begin any sentential from derived from $s$ .

Given a non-terminal symbol $s$ , the follow set $F o ll o w (s)$ is the set of terminals that can appear immediately after $s$ in some derivation. We can compute the follow set by repeating these steps until nothing can be added to any $F o ll o w$ set:

Add $\$$ to$ \mathcal{Follow}(S) $, w h ere$ S $i s t h es t a r t sy mb o l an d$ $$ is the end-of-input symbol.
For a production $A \to α Bβ$ , add all elements of $F i rs t (β)$ , except for $ϵ \in F i rs t (β)$ , add $F o ll o w (A)$ to $F o ll o w (B)$ .
If productions $A, B$ occur where $B$ follows $A$ in a production rule, and if $ϵ \in F i rs t (B)$ , then we add the elements of $F o ll o w (B)$ into $F o ll o w (A)$ .

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Context-free grammar

Productions

Derivation

Graph View

Backlinks