jszhn

Recent Notes

ALOHA
Feb 01, 2026
ARP
Feb 01, 2026
American literature
Feb 01, 2026
Assert
Feb 01, 2026
Atomics
Feb 01, 2026

❯

Tokenisation

Dec 17, 20241 min read

Tokenisation is a process where a long string is broken up into smaller sub-strings called tokens. In a natural language context, tokenisation breaks a string into a vector of words.

See also

Lexical analysis in compiler design

Graph View

Backlinks

Large language model
Natural language processing

Created with Quartz v4.5.2 © 2026

Twitter
LinkedIn
GitHub