What is Tokenization? Count how many tokens are present in the following statement:
I find that the harder I work, the more luck I seem to have.
Generated by claude-sonnet-4-6 · 2026-06-21 03:18 · grounding rag
Model Answer
Tokenization is a step in Text Normalisation where each sentence is divided into smaller units called tokens. A token can be any word, number, or special character occurring in a sentence.
Counting tokens in: I find that the harder I work, the more luck I seem to have.
Tokens: I / find / that / the / harder / I / work / , / the / more / luck / I / seem / to / have / .
Total number of tokens = 16
Source: Text Normalisation – Tokenization, Chapter 6
---
Explanation
- Define tokenization clearly (1 mark) — mention that tokens include words AND special characters/punctuation.
- Count carefully (1 mark) — the comma (,) and the full stop (.) are also separate tokens. Students often miss punctuation marks and lose the mark. Total = 16 tokens.