Document 1 : CV is an upcoming field.
Document 2 : Image Feature is an important part of CV.
You have two documents :
Document 1 : CV is an upcoming field.
Document 2 : Image Feature is an important part of CV.
Implement all four steps of the Bag of Words (BoW) model to create a document vector table. Depict the outcome of each step.
Generated by claude-sonnet-4-6 · 2026-06-21 03:20 · grounding stimulus
Model Answer
Step 1 – Collect Training Data (Documents):
- Doc 1: "CV is an upcoming field."
- Doc 2: "Image Feature is an important part of CV."
Step 2 – Design the Vocabulary (unique words):
{ CV, is, an, upcoming, field, Image, Feature, important, part, of }
(Stop words retained; total = 10 unique words)
Step 3 – Create Document Vectors (word frequency count):
| Word | CV | is | an | upcoming | field | Image | Feature | important | part | of |
|---|---|---|---|---|---|---|---|---|---|---|
| Doc 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Doc 2 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
Step 4 – Use Vectors for ML Model:
Doc 1 → [1,1,1,1,1,0,0,0,0,0]
Doc 2 → [1,1,1,0,0,1,1,1,1,1]
These numerical vectors are fed into a Machine Learning model for text classification or analysis.
---
Explanation
- Examiners look for: All four steps clearly labelled with correct output at each step.
- The vocabulary must list only unique words across both documents.
- Frequency counts must be accurate — CV appears once in each document.
- The final vector representation ties the answer together; don't skip it.
- Roughly 1 mark per step — label them clearly to ensure full credit.