The Hamming distance is a metric for comparing two equal-length strings by counting the number of positions at which the corresponding characters are different. It is named after Richard Hamming, who introduced it in his paper on error-detecting and error-correcting codes.
The Hamming distance between two equal-length strings and is defined as:
where is the length of the strings, is the Iverson bracket (1 if is true, 0 if is false) and denotes the character at position in string .
Properties
The Hamming distance is a metric on the set of words of fixed length , known as a Hamming space. It fulfills the conditions of non-negativity, symmetry, and the identity of indiscernibles, meaning the Hamming distance between two words is zero if and only if the words are identical. It also satisfies the triangle inequality: for any three words , , and , the distance between and is not greater than the sum of the distances between and , and between and . This is because any difference between the -th letter of and implies a difference between the -th letter of and , or between and . The Hamming distance can also be interpreted as the Hamming weight of for a suitable choice of the subtraction operator, similar to how the difference between two integers represents a distance from zero on the number line.
For binary strings and , the Hamming distance equals the number of ones in the result of (where denotes the XOR operation). The metric space of length- binary strings, with the Hamming distance, is known as the Hamming cube. This space is equivalent to the set of distances between vertices in a hypercube graph. Additionally, a binary string of length can be viewed as a vector in , where each symbol in the string is treated as a real coordinate. In this context, the strings form the vertices of an -dimensional hypercube, and the Hamming distance between the strings corresponds to the Manhattan distance between these vertices.
Error Detection and Correction
The minimum Hamming distance, often denoted as , is a crucial concept in coding theory, particularly in the design of error-detecting and error-correcting codes. A code is said to be error detecting if the minimum Hamming distance between any two of its codewords is at least . For instance, consider a code with two codewords "000" and "111". The Hamming distance between these two words is 3, making it a error detecting code. This implies that if one or two bits are flipped, the error can be detected. However, if three bits are flipped, "000" becomes "111", and the error goes undetected.
A code is -error correcting if, for every word in the Hamming space , there exists at most one codeword from such that the Hamming distance between and is at most . In simpler terms, a code is -errors correcting if the minimum Hamming distance between any two of its codewords is at least . Geometrically, this means that any closed balls of radius centered on distinct codewords are disjoint, often referred to as Hamming spheres.
For example, consider the same 3-bit code with codewords "000" and "111". The Hamming space consists of the words 000, 001, 010, 011, 100, 101, 110, and 111. The codeword "000" and its single-bit error words "001", "010", and "100" are all within a Hamming distance of 1 from "000". Similarly, the codeword "111" and its single-bit error words "110", "101", and "011" are within a Hamming distance of 1 from "111". This code can correct a single-bit error, making it a error-correcting code. Since the minimum Hamming distance between "000" and "111" is 3, the code satisfies the condition .
In summary, a code with a minimum Hamming distance between its codewords can detect up to errors and can correct up to errors. The latter is known as the packing radius or the error-correcting capability of the code.
Applications
In the context of telecommunications, the Hamming distance is used as an error metric for data transmission, and is sometimes called the signal distance.
For DNA sequence analysis, it provides a simple way to measure genetic differences between sequences. In cryptography, it helps assess the security of encryption algorithms by measuring the difference between plaintext and ciphertext. The distance is also essential in information theory for analyzing code efficiency and in network coding for optimizing data transmission protocols.
Implementation
code.js
1functionhammingDistance(s1: string,s2: string): number {2if(s1.length!== s2.length){3thrownewError('Strings must be of equal length');4}56let distance =0;7for(let i =0; i < s1.length; i++){8if(s1[i]!== s2[i]){9 distance++;10}11}12return distance;13}
References
Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell System Technical Journal, 29(2), 147-160.
Hamming, R. W. (1986). Coding and Information Theory (2nd ed.). Prentice-Hall.