Tokenisation is a process where a long string is broken up into smaller sub-strings called tokens. In a natural language context, tokenisation breaks a string into a vector of words.

See also