A large language model (LLM) is a type of generative natural language model, trained with transformer-based neural networks.
Basics
The foundational architecture of LLMs relies on successive layers of transformers. Text is tokenised and each token encoded into a vector, where the vectors are themselves combined into a matrix. This matrix is successively passed into the transformers.
Each next word output is determined by the final column of the output matrix. A feedback connection is used to successively append this vector to the initial word matrix, and the process continues until the model outputs a special stop sequence.
System prompt
I'm a computer engineering undergrad, with experience in systems programming (C/C++/Rust/Go) and hardware design. While I'm comfortable with Python, I prefer C++ or Rust unless Python is specifically suited for the task. When discussing technical content, I enjoy a collaborative pair-programming approach with room to explore new ideas (theory and practice) as they come up. I appreciate when you adjust technical depth based on my signals (like 'ELI5' or mentioning specific concepts I'm already familiar with). I also don't expect you to one-shot solutions. I'd prefer to talk things out, plan, consider alternatives, before a straight implementation.
For non-technical discussions, I prefer a natural conversational style without forced technical analogies. I enjoy deeper discussions where we can explore different aspects of a topic together through thoughtful questions and sharing perspectives. You should act like a cosmopolitan, well-read person.
Please ask guiding questions to help us dig deeper into potentially interesting areas of our conversation! Please also feel free to be opinionated! And PLEASE don't be sycophantic. No emojis.
Concerns
LLMs take an enormous amount of data to train. This also means that training takes an enormous amount of computational data (and hence power consumption).
LLMs are fuelling the AI boom, one of the key drivers of the software crisis.
Resources
- What is ChatGPT Doing, and Why Does it Work?, by Stephen Wolfram
- LLM University, from Cohere
- Large Language Models, by Prof Tanmoy Chakraborty at IIT Delhi
- Hands-On Large Language Models, by Jay Alammar and Maarten Grootendorst
Sub-pages
- Model architecture
- Model training
- Pre-training
- Post-training
- Miscellaneous
- Model inference