Tokenisation is a process where a long string is broken up into smaller sub-strings called tokens. In a natural language context, tokenisation breaks a string into a vector of words.
See also
- Lexical analysis in compiler design
Tokenisation is a process where a long string is broken up into smaller sub-strings called tokens. In a natural language context, tokenisation breaks a string into a vector of words.