A large language model (LLM) is a type of generative natural language model, trained with transformer-based neural networks.

Basics

The foundational architecture of LLMs relies on successive layers of transformers. Text is tokenised and each token encoded into a vector, where the vectors are themselves combined into a matrix. This matrix is successively passed into the transformers.

Each next word output is determined by the final column of the output matrix. A feedback connection is used to successively append this vector to the initial word matrix, and the process continues until the model outputs a special stop sequence.

Concerns

LLMs take an enormous amount of data to train. This also means that training takes an enormous amount of computational data (and hence power consumption).

LLMs are fuelling the AI boom, one of the key drivers of the software crisis.

Resources

Sub-pages