"how numbers are stored and used in computers"

Levenshtein Distance

The Levenshtein distance is a string metric for measuring the difference between two sequences. It is defined as the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.

O(m·n)

space

O(1)

O(n)·O(n)

Cosine distance

O(n)·O(1)

Hamming distance

O(m+n)·O(m+n)

Jaccard distance

Sorensen-Dice coefficient

Overlap coefficient

O(m·n)·O(m·n)

Levenshtein distance

Damerau-Levenshtein distance

O(m·n)·O(1)

Jaro-Winkler distance

O(n)

time

O(m·n)

History

The Levenshtein distance is named after the Soviet mathematician Vladimir Levenshtein, who introduced the concept in a 1966 paper. The distance is also known as the edit distance, which is the minimum number of edits (insertions, deletions, or substitutions) required to transform one string into another.

Definition

The Levenshtein distance between two strings and is defined recursively as:

where denotes the length of string , denotes the first character of , and denotes the substring of starting from index .

Change the value of the two strings to see how the Levenshtein distance is computed.

Delay between steps: 100 ms

The Levenshtein distance produces a non-negative value that ranges from 0 to infinity, where 0 indicates identical strings. The distance is symmetric, meaning the distance from string A to B is the same as from B to A. Most importantly, it satisfies the triangle inequality, making it a proper metric. This means that for any three strings A, B, and C, the distance from A to C will never be greater than the sum of the distances from A to B and B to C.

Applications

The Levenshtein distance finds widespread use in various text processing applications. In spell checking, it helps identify and correct misspelled words by finding the closest matching words in a dictionary. For DNA sequence alignment, it helps identify genetic similarities and differences between sequences. In natural language processing, it's used for text normalization and fuzzy matching. The distance is also valuable in plagiarism detection systems, where it helps identify similar text passages. Additionally, it plays a crucial role in speech recognition systems, helping to match spoken words with their written forms.

Implementation

Here is an implementation of the Levenshtein distance in JavaScript.

code.js
1function levenshteinDistance(str1, str2) {
2    const m = str1.length;
3    const n = str2.length;
4    const dp = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));
5
6    for (let i = 0; i <= m; i++) {
7        for (let j = 0; j <= n; j++) {
8            if (i === 0) {
9                dp[i][j] = j;
10            } else if (j === 0) {
11                dp[i][j] = i;
12            } else if (str1[i - 1] === str2[j - 1]) {
13                dp[i][j] = dp[i - 1][j - 1];
14            } else {
15                dp[i][j] = 1 + Math.min(
16                    dp[i - 1][j],      // deletion
17                    dp[i][j - 1],      // insertion 
18                    dp[i - 1][j - 1]   // substitution
19                );
20            }
21        }
22    }
23
24    return dp[m][n];
25}
26
27const distance = levenshteinDistance("kitten", "sitting");

Space efficiency

The space complexity of the Levenshtein distance algorithm is , where and are the lengths of the two strings. This is because the algorithm uses a 2D array to store the distances between all pairs of characters in the two strings.

Time efficiency

The time complexity of the Levenshtein distance algorithm is , where and are the lengths of the two strings. This is because the algorithm must populate an entire 2D array to store the distances between all pairs of characters in the two strings.

Output

code.js
1console.log(distance); // 3

References

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707-710.
Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the ACM, 21(1), 168-173.