Step 1 – Collect Data (Documents):
Step 2 – Tokenisation (create unique word list/vocabulary):
NLP, is, a, domain, of, AI, stands, for, Natural, Language, Processing
(Total 11 unique words)
Step 3 – Create Document Vectors (word frequency count):
| Word | Doc 1 | Doc 2 |
|------|-------|-------|
| NLP | 1 | 1 |
| is | 1 | 0 |
| a | 1 | 0 |
| domain | 1 | 0 |
| of | 1 | 0 |
| AI | 1 | 0 |
| stands | 0 | 1 |
| for | 0 | 1 |
| Natural | 0 | 1 |
| Language | 0 | 1 |
| Processing | 0 | 1 |
Step 4 – Apply BoW Model:
Each document is represented as a vector of word frequencies:
Source: AI Chapter, Natural Language Processing – Bag of Words
---
Examiners award 1 mark per step. Ensure all four steps are clearly labelled. The vocabulary must list unique words only. The document vector table (frequency count) is the core deliverable — write it neatly. Final vectors in bracket notation reinforce Step 4 and show you understand the BoW output format.