Note_Tech

All technological notes.


Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

R - Information Retrieval

Back


Vector Space Model


TF-IDF


Computing TF-IDF


TF-IDF variants

tf-idf_variant


Cosine Similarity

cosine_similarity


Text Preprocessing

  1. Word separation, sentence splitting
  2. Change terms to a standard form (e.g. lowercase)
  3. Eliminate stop words (e.g., a, and, of, to, at, is, the, …)
  4. Stem terms to their base form (e.g., eliminate prefixes, suffixes)
  5. Construct mapping between terms and documents (indexing)

TOP