A large language model (LLM) is a type of generative natural language model, trained with transformer-based neural networks.
Basics
The foundational architecture of LLMs relies on successive layers of transformers. Text is tokenised and each token encoded into a vector, where the vectors are themselves combined into a matrix. This matrix is successively passed into the transformers.
Each next word output is determined by the final column of the output matrix. A feedback connection is used to successively append this vector to the initial word matrix, and the process continues until the model outputs a special stop sequence.
Concerns
LLMs take an enormous amount of data to train. This also means that training takes an enormous amount of computational data (and hence power consumption).
LLMs are fuelling the AI boom, one of the key drivers of the software crisis.
Resources
- What is ChatGPT Doing, and Why Does it Work?, by Stephen Wolfram
- LLM University, from Cohere
- Large Language Models, by Prof Tanmoy Chakraborty at IIT Delhi
- Hands-On Large Language Models, by Jay Alammar and Maarten Grootendorst
Sub-pages
- Model architecture
- Model training
- Post-training
- Miscellaneous
- Model inference