Term Frequency (TF):
Term Frequency measures how often a word appears in a document. It helps identify how important a word is within a single document.
Formula: TF = (Number of times word appears in document) / (Total words in document)
Example: In Document: "Johny Johny Yes Papa" — TF of "Johny" = 2/4 = 0.5
---
Inverse Document Frequency (IDF):
IDF measures how rare or valuable a word is across all documents in the corpus. Common words (like "the") get a low IDF; rare words get a high IDF.
Formula: IDF = log(Total documents / Documents containing the word)
Example: If "Papa" appears in 3 out of 4 documents — IDF of "Papa" = log(4/3) ≈ 0.125
TFIDF = TF × IDF — words that are frequent in one document but rare across the corpus are most valuable.
Source: Chapter 6, Section 6 (TFIDF concept)
---