For a finite set of symbols, we call an alphabet. Then we also define:

  • A string (or sentence) as a sequence of alphabet symbols.
  • is the set of all strings over that have length .
  • is the set of all strings over with length at least 1 (i.e., excluding the empty string ), called the positive closure.
  • is the set of all strings over , called the Kleene closure. This includes the empty string. Practically speaking, this is a repetition.
    • For a language , the Kleene star is any .

In discrete mathematics and computer science, a formal language over the alphabet is any set , i.e., any set of strings of characters of the alphabet. Defining formal languages allows us to explore regular expressions.

The concatenation of and is the set:

The union of and is: