What is Hamming Distance? Applications and Operations

What is Hamming Distance? Applications and Operations

Before going through the blog, let’s take a quick look at the prerequisites. You should have a foundational knowledge of binary representation and basic mathematical operations. It will be beneficial if you have a firm grasp on the concept of strings and sets.

While Hamming distance coding is possible in any programming language, we will utilize C++ 23 in this context. This decision stems from the 2022 survey conducted by HackerEarth, where C++ was identified as the foremost general programming language. Out of the respondents, 53.3% favored C++, followed by Java (26.7%) and Python (10%).

Table of Contents

Master the art of computer programming through our YouTube video on

Video Thumbnail

What is Hamming Distance?

What is Hamming Distance

By definition, Hamming distance is a metric in information theory that is used for comparing two binary data strings of equal length at corresponding positions. The crux of Hamming distance is to find the minimum number of substitutions needed to change one string into another.

Note: In the blog, we will consider two strings, 'strOne' and 'strTwo' of equal length, 'len', and represented by d(strOne, strTwo)'.

Hamming distance follows multiple conditions. A few are mentioned below:

  • The length of the two strings involved in Hamming distance should be equal.
  • If two characters are identical, then the Hamming distance is zero. 
  • Hamming distance follows the concept of symmetry. Meaning the number of characters in both strings is equal.
  • Hamming distance also satisfies the concept of triangle inequality. 
  • For binary strings, Hamming distance is equal to the number of ones in d(strOne XOR strTwo).

Here are a few of the important terminologies that you might encounter during your learning journey:

  • Hamming Weight: The “Hamming weight,” within the context of computer science and information theory, signifies the count of non-zero and distinguishable elements within a sequence, commonly expressed in binary form. It quantifies the number of ‘1’ bits present, offering insight into the data’s density and aiding error detection algorithms.
  • Population Count: The population count refers to the quantitative assessment of the disparity between two strings of equal length by calculating the total count of positions at which their corresponding elements differ.
  • Hamming Space: A Hamming space is a mathematical concept in which binary strings of equal length serve as points. It measures the dissimilarity between two strings by calculating the number of differing bits. Hamming spaces find applications in error detection, data clustering, and information retrieval.

Brief History of Hamming Distance

The American mathematician Richard Hamming first coined the concept of Hamming distance. While working at Bell Labs, he introduced Hamming distance in his “Hamming codes, Error detection, and error correction codes” paper in 1950. 

He was working on developing a method for the identification and correction of bits in data transmission. He concluded that by counting the number of bits that differed between two binary sequences, he could determine how many errors had occurred.

How is Hamming Distance Calculated?

Calculating the Hamming Distance involves measuring the dissimilarity between two strings of equal length. Here are some processes:

  • Compare Corresponding Symbols: Compare the symbols (characters or bits) at each position in the two strings, starting from the first position.
  • Count Differences: For every position where the symbols are different, increment a counter by one. This counter keeps track of the number of differences between the strings.
  • Summation: After comparing all positions, the counter’s value represents the Hamming Distance. It signifies the total number of positions at which the two strings differ.

Mathematically, Hamming distance is expressed as:

Hamming Distance = Σ (strOne[i] ≠ strTwo[i])

here,
Hamming Distance denotes the disparity between the strings.
Σ symbolizes the summation of all positions.
strOne[i] represents the element at index ‘i’ in the first string.
strTwo[i] represents the element at index ‘i’ in the second string.
≠ signifies “not equal.”

Hamming Distance Formula

The Hamming distance formula is as follows:

hamming_distance(strOne, strTwo) = number of bits in strOne that are different from bits in strTwo

where:

  • strOne is the first binary string
  • strTwo is the second binary string

For example, the Hamming distance between the strings 101010 and 111010 is 1. There is one bit position where the strings are different: the first bit.

Application of Hamming Distance

Application of Hamming Distance

Below mentioned are the various fields in which hamming distance is applied: 

  • Hamming Distance in Computer Science: Hamming distance is primarily utilized in computer science. It employs error detection and correction of the data packets in computer networking. It ensures data integrity during data transmission. Also, it is used to enhance data reliability and prevent data corruption. 
  • Hamming Distance in Artificial Intelligence: Hamming distance applies to clustering algorithms and pattern recognition. It measures the similarity or dissimilarity between feature vectors, aiding in grouping similar data points. It integrates image segmentation, document classification, and recommendation systems.
  • Hamming Distance in Cryptography: Hamming distance is widely used in the domain of cryptography. It provides cryptographic protocols for encryption security and resilience against unauthorized access. 
  • Hamming Distance in Telecommunications: Hamming distance helps to detect and correct errors in data transmission over noisy channels.
  • Hamming Distance in Biology: In genetics and bioinformatics, Hamming distance aids in comparing DNA sequences. It measures the dissimilarity between genetic sequences, facilitating tasks such as sequence alignment, identifying mutations, and evolutionary analysis. 

It is also seen that Hamming distance is also used for digital forensics to check for alterations and barcoding identifiers for enhanced readability, even when the barcode is distorted. 

Get 100% Hike!

Master Most in Demand Skills Now!

Hamming Distance Vs. Euclidean Distance

Hamming Distance Vs Euclidean Distance

It would be best to have a glimpse of Euclidean distance to have better clarity about Hamming distance. Below, we have mentioned the major differences between Hamming distance and Euclidean distance.

ParametersHamming Distance Euclidean Distance
DefinitionMeasures dissimilarity in binary dataMeasures geometric distance in vectors
Data TypeUsed for binary or categorical dataApplicable to numerical data
CalculationCounts positions with differing valuesCalculates vector magnitude
DimensionalityWorks well with equal-length stringsSuits multi-dimensional data
Scale SensitivityInsensitive to the scale of attributesSensitive to attribute scales
Zero ValuesConsiders zero values as meaningfulTreats zero values as origin
ApplicabilitySuitable for text analysis, error checksApplied in spatial and numeric contexts
InterpretabilityProvides insights into symbolic dataOffers geometric interpretation
Distance RangeRange is [0, string length]Range varies based on data
Weighted FeaturesWeighting is not applicableWeighted Euclidean Distance can be used

Let’s Code: Hamming Distance

Here is the implementation of Hamming distance:

Pseudo Code:

  • Read the first string, ‘strOne’, and the second string, 'strTwo', from the user.
  • If the length of ‘strOne’ is not equal to the length of ‘strTwo’, throw an exception.
  • Initialize a variable 'distance' to 0.
  • Loop through each character at index ‘i’ from 0 to the length of ‘strOne’.
    • If strOne[i] is not equal to strTwo[i], increment 'distance'.
  • Print the value of 'distance' as the Hamming Distance.

Algorithm:

  • Read two input strings: 'strOne', and 'strTwo'.
  • Check if the lengths of 'strOne', and 'strTwo' are the same. If not, throw an exception.
  • Initialize a variable 'distance' to 0 to store the Hamming Distance.
  • Iterate through each character at the same index in both strings.
  • If the string characters at the current index are not equal, increment the distance.
  • Print the value of distance as the Hamming Distance.

Hamming Code C++:

#include <iostream>
#include <string>
#include <algorithm>
int hammingDistance(const std::string& strOne, const std::string& strTwo) {
    if (strOne.length() != strTwo.length()) {
        throw std::invalid_argument("Strings must have equal lengths");
    }
    int distance = 0;
    for (size_t i = 0; i < strOne.length(); ++i) {
        if (strOne[i] != strTwo[i]) {
            distance++;
        }
    }
    return distance;
}
int main() {
    std::string strOne, strTwo;
    std::cout << "Enter the first string: ";
    std::cin >> strOne;
    std::cout << "Enter the second string: ";
    std::cin >> strTwo;
    try {
        int distance = hammingDistance(strOne, strTwo);
        std::cout << "Hamming Distance: " << distance << std::endl;
    } catch (const std::invalid_argument& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }
    return 0;

Output:

Enter the first string: 101010
Enter the second string: 111000
Hamming Distance: 3

Hamming Code Python:

def hamming_distance(strOne, strTwo):
    if len(strOne) != len(strTwo):
        raise ValueError("Strings must have equal length for Hamming Distance calculation")
    distance = 0
    for char1, char2 in zip(strOne, strTwo):
        if char1 != char2:
            distance += 1
    return distance
# Test Cases
try:
    strOne = "101010"
    strTwo = "100110"
    distance = hamming_distance(strOne, strTwo)
    print(f"Hamming Distance between '{strOne}' and '{strTwo}' is: {distance}")
except ValueError as ve:
    print(ve)
try:
    strOne = "101010"
    strTwo = "10011"
    distance = hamming_distance(strOne, strTwo)
    print(f"Hamming Distance between '{strOne}' and '{strTwo}' is: {distance}"
except ValueError as ve:
    print(ve)

Output:

Hamming Distance between ‘101010’ and ‘100110’ is: 2
Strings must have equal length for Hamming Distance calculation

Hamming Code Java:

import java.util.Scanner;
public class IntellipaatHammingDistance {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        System.out.print("Enter the first string: ");
        String string1 = scanner.nextLine();
        System.out.print("Enter the second string: ");
        String string2 = scanner.nextLine();
        if (string1.length() != string2.length()) {
            System.out.println("Error: The two strings must have equal length.");
        } else {
            int hammingDistance = calculateHammingDistance(string1, string2);
            System.out.println("Hamming Distance: " + hammingDistance);
        }
        scanner.close();
    }
    public static int calculateHammingDistance(String str1, String str2) {
        int distance = 0;
        for (int i = 0; i < str1.length(); i++) {
            if (str1.charAt(i) != str2.charAt(i)) {
                distance++;
            }
        }
        return distance;
    }
}

Output:

Enter the first string: 101010
Enter the second string: 111110
Hamming Distance: 1

Operations on Hamming Distance

Here are a few of the operations that you can perform using Hamming distance:

Hamming Distance Between Two Integers

Problem Statement: Write a program to find the hamming distance between two integers. Considering the inputs as intOne = 10 (1010), intTwo = 14 (1110), and output: 3. 

Implementation:

#include <bits/stdc++.h>
using namespace std;
int hammingDistance(int intOne, int intTwo)
{
int d = intOne ^ intTwo;
int setBits = 0;
while (d > 0) {
setBits += d & 1;
d >>= 1;
}
return setBits;
}
// Driver code
int main()
{
int intOne = 10, intTwo = 14;
cout << hammingDistance(10, 14) << "/n";
return 0;
}

Time Complexity: O(log2x)
Auxiliary Space: O(1)

We can also solve this problem using two more approaches:

  • Checking the maximum of both numbers first, then checking the bits at each position.
  • Using the concept of string manipulation. 

Reduce Hamming Distance by Swapping Two Characters

Problem Statement: Write a program to find the positions of two characters to be swapped in the strings strOne and strTwo using Hamming distance in such a way that the distance between the two strings is the smallest. Taking inputs as strOne = “Intellipaat”, and strTwo = “Inllitepata”, and output: 6 3.

Implementation:

#include <bits/stdc++.h>
using namespace std;#define MAX 26
void Swap(string one, string two, int n)
{
int dp[MAX][MAX];
memset(dp, -1, sizeof dp);
int t = 0;
for (int i = 0; i < n; i++)
if (one[i] != two[i])
t++;
for (int i = 0; i < n; i++) {
int a = one[i] - 'a';
int b = two[i] - 'a';
if (a == b)
continue;
if (dp[a][b] != -1) {
cout << i + 1 << " " << dp[a][b] + 1 << endl;
return;
}
dp[b][a] = i;
}
int A[MAX], B[MAX];
memset(A, -1, sizeof A);
memset(B, -1, sizeof B);
for (int i = 0; i < n; i++) {
int a = one[i] - 'a';
int b = two[i] - 'a';
if (a == b)
continue;
if (A[b] != -1) {
cout << i + 1 << " " << A[b] + 1 << endl;
return;
}
if (B[a] != -1) {
cout << i + 1 << " " << B[a] + 1 << endl;
return;
}
A[a] = i;
B[b] = i;
}
cout << -1 << endl;
}
// Driver code
int main()
{
string strOne = "Intellipaat";
string strTwo = "Inllitepata";
int n = strOne.length();
if (strOne == "" || strTwo == "")
cout << "Required string is empty.";
else
Swap(strOne, strTwo, n);
return 0;
}

Time Complexity: O(n)
Auxiliary Space: O(MAX * MAX)

Finding Rotation with Maximum Hamming Distance

Problem Statement: Write a program to find a rotation with the maximum Hamming distance. Input an array of size ‘N’ and another array that contains the rotation of the first array and print the maximum distance between them. Take N = 3, arr = {1,4,1}, and output: 2.

Implementation:

#include <bits/stdc++.h>
using namespace std;
int maxHamming(int arrOne[], int N)
{
int arrTwo[2 * N + 1];
for (int i = 0; i < N; i++) {
arrTwo[i] = arrOne[i];
arrTwo[N + i] = arrOne[i];
}
int maxHam = 0;
for (int i = 1; i < N; i++) {
int Ham = 0;
for (int j = i, k = 0; j < (i + N); j++, k++)
if (arrTwo[j] != arrOne[k])
Ham++;
if (Ham == N)
return N;
maxHam = max(maxHam, Ham);
}
return maxHam;
}
// Driver program
int main()
{
int arrOne[] = { 1, 4, 1 };
int N = sizeof(arrOne) / sizeof(arrOne[0]);
cout << maxHamming(arrOne, N);
return 0;
}

Time Complexity: O(n2)
Auxiliary Space: O(n)

We have used a naive approach to solve the problem. But this problem can also be solved using two other approaches:

  • Using list comprehension
  • Constant Space

Conclusion

In the blog, ‘What is Hamming Distance? Operations and Applications’, we briefly discussed Hamming distance, its applications, and various operations you can perform using it. Hamming distance is helpful for error correction, computer programming, and computer networks. 

To expand your knowledge, we recommend exploring related topics like error-correcting codes and information theory. To solidify your understanding, practicing more problems involving Hamming Distance will help you master its applications. Computer science is like an ocean, and nobody has seen it all. Hence, consistent efforts are required to explore various untouched concepts. 

About the Author

Technical Research Analyst - Full Stack Development

Kislay is a Technical Research Analyst and Full Stack Developer with expertise in crafting Mobile applications from inception to deployment. Proficient in Android development, IOS development, HTML, CSS, JavaScript, React, Angular, MySQL, and MongoDB, he’s committed to enhancing user experiences through intuitive websites and advanced mobile applications.