Large language model

A large language model (LLM) is a type of generative natural language model, trained with transformer-based neural networks.

Basics

The foundational architecture of LLMs relies on successive layers of transformers. Text is tokenised and each token encoded into a vector, where the vectors are themselves combined into a matrix. This matrix is successively passed into the transformers.

Each next word output is determined by the final column of the output matrix. A feedback connection is used to successively append this vector to the initial word matrix, and the process continues until the model outputs a special stop sequence.

Concerns

LLMs take an enormous amount of data to train. This also means that training takes an enormous amount of computational data (and hence power consumption).

LLMs are fuelling the AI boom, one of the key drivers of the software crisis.

Resources

What is ChatGPT Doing, and Why Does it Work?, by Stephen Wolfram
LLM University, from Cohere
Large Language Models, by Prof Tanmoy Chakraborty at IIT Delhi
Hands-On Large Language Models, by Jay Alammar and Maarten Grootendorst

Sub-pages

Model architecture
- Transformer
Model training
- Post-training
  - Supervised fine-tuning (SFT)
  - Reinforcement learning
- Miscellaneous
  - Neural scaling law
Model inference
- Prompt engineering
- Chain of thought

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Large language model

Basics

Concerns

Resources

Sub-pages

Graph View

Backlinks