A key component in modern compilers is the front-end, which reads in a source program and translates it to an intermediate and often language-agnostic form for the rest of the compiler to optimise on. Front-ends are language-dependent but machine-independent.
There are four separate functions within front-ends.
- Lexers read individual characters and creates a string of tokens. Tokens can be reserved words, names, operators, and punctuation symbols. I guess it essentially strips down the program to its fundamental syntax.
- Parsers take the token stream and ensure the correctness of the syntax. It then produces an “abstract syntax tree” (or parse tree), which represents the syntactic structure of the program. It also builds a symbol table, which lists symbols in code with relevant information.
- Semantic analysers take the abstract syntax tree and check for semantic correctness, i.e., that the variables/types are properly declared, and that the types of operators and object match (type checking). A symbol table representing all named objects is usually created for type checking.
- Intermediate representation generators take the symbol table and abstract syntax tree, and generates the intermediate representation (the front-end’s output). In modern compilers, this operates with an infinite number of virtual registers, which are later mapped to a finite set of real registers.