Document 1 : Data Science requires information.
Document 2 : Information analysis requires data.
Implement all the four steps of Bag of Words (BoW) model to create a document vector table.
Generated by claude-sonnet-4-6 · 2026-06-21 03:20 · grounding stimulus
Model Answer
Step 1 – Collect Documents:
- Doc 1: "Data Science requires information."
- Doc 2: "Information analysis requires data."
Step 2 – Create Vocabulary (unique words, lowercase):
{data, science, requires, information, analysis}
Step 3 – Create Word Frequency Vectors:
| | data | science | requires | information | analysis |
|---|---|---|---|---|---|
| Doc 1 | 1 | 1 | 1 | 1 | 0 |
| Doc 2 | 1 | 0 | 1 | 1 | 1 |
Step 4 – Document Vector Table ready.
Each document is represented as a numeric vector based on word frequency.
Source: Bag of Words Model, Document Representation
---
Explanation
- Examiners look for all four steps clearly labelled: collect text → build vocabulary → count frequencies → form vector table.
- The table is the most important part — ensure rows = documents, columns = vocabulary words.
- Treat words as case-insensitive and ignore punctuation.
- "requires" appears in both documents → frequency = 1 in each row.
- Do not skip the vocabulary listing step; it earns separate marks.