Step 1: Text Processing (Pre-processing)
Remove stop words (and, are, but, to), convert to lowercase.
Step 2: Create Dictionary (Vocabulary)
List all unique words from both documents:
| sameera | sanya | classmates | likes | dancing | loves | study | mathematics |
|---------|-------|------------|-------|---------|-------|-------|-------------|
Step 3: Create Document Vector for Doc 1
| sameera | sanya | classmates | likes | dancing | loves | study | mathematics |
|---------|-------|------------|-------|---------|-------|-------|-------------|
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Step 4: Document Vector Table for All Documents
| sameera | sanya | classmates | likes | dancing | loves | study | mathematics |
|---------|-------|------------|-------|---------|-------|-------|-------------|
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Source: Chapter 6, Section 6.5 – Bag of Words
---