Consider the text of following documents:
Document 1: Sahil likes to play cricket
Document 2: Sajal likes cricket too
Document 3: Sajal also likes to play basketball
Apply all the four steps of Bag of words model of NLP on the above given documents and generate the output.
Generated by claude-sonnet-4-6 · 2026-06-21 03:18 · grounding rag
Model Answer
Step 1: Collect Data (Documents)
- Doc 1: Sahil likes to play cricket
- Doc 2: Sajal likes cricket too
- Doc 3: Sajal also likes to play basketball
Step 2: Create a List of Unique Words (Vocabulary)
{sahil, likes, to, play, cricket, sajal, too, also, basketball}
(9 unique words)
Step 3: Remove Stop Words (optional normalisation)
Stop words like "to", "too", "also" may be removed → Vocabulary: {sahil, likes, play, cricket, sajal, basketball}
Step 4: Create Document Vectors (Frequency Table)
| Word | Doc1 | Doc2 | Doc3 |
|----------|------|------|------|
| sahil | 1 | 0 | 0 |
| likes | 1 | 1 | 1 |
| play | 1 | 0 | 1 |
| cricket | 1 | 1 | 0 |
| sajal | 0 | 1 | 1 |
| basketball | 0 | 0 | 1 |
Each document is now represented as a numerical vector based on word frequency.
Source: Chapter 6, Bag of Words Model
---
Explanation
- Examiners expect all four steps clearly labelled: data collection → vocabulary creation → stop word removal → document vector/frequency table.
- The frequency table is the key output; missing it will cost marks.
- You don't need to calculate TF-IDF here — just the BoW frequency table.
- Stop word removal is considered one of the steps in this model as taught in the chapter; include it even briefly.