Number Representations & States

"how numbers are stored and used in computers"

Cosine Distance

Cosine distance is a measure of similarity between two strings by treating them as vectors, and computing the cosine of the angle between them. It is particularly useful for comparing documents or text strings where the order of words is not as important as their frequency.

Mathematical Definition

The cosine distance between two strings and is defined as

The vectors and are representations of a string that can be created from an arbitrary strategy - character n-grams, word frequency counts, TF-IDF (Term Frequency-Inverse Document Frequency), or more popular recently, word embeddings. The computed value is the "angle" between the vectors, and represents the magnitude of vector .

Properties

  • Cosine distance produces a value ranging from to , where indicates identical strings and indicates completely different strings.
  • Cosine distance is symmetric, meaning the distance from string to is the same as from to .
  • Cosine distance does not satisfy the triangle inequality, making it a non-metric distance measure.
  • Cosine distance is invariant to string length, making it particularly useful for comparing documents of different sizes.

Applications

  • In document similarity analysis, it helps identify similar documents regardless of their length.
  • For text classification tasks, it provides an effective way to categorize documents based on their content.
  • In information retrieval systems, it helps rank search results by relevance.
  • For plagiarism detection, where it can identify similar text passages across different documents. Additionally, it plays a crucial role in search engines, helping to match user queries with relevant documents in the index.

Implementation

Cosine similarity is a measure of similarity between two vectors.

code.ts
1function cosineSimilarity(A: number[], B: number[]): number { 2 if (A.length !== B.length) { 3 throw new Error('Vectors must be of the same length'); 4 } 5 6 let dotProduct = 0; 7 let magnitudeA = 0; 8 let magnitudeB = 0; 9 10 for (let i = 0 ; i < A.length ; i++) { 11 dotProduct += A[i] * B[i]; 12 magnitudeA += A[i] * A[i]; 13 magnitudeB += B[i] * B[i]; 14 } 15 16 return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB)); 17}