In our last technology explainer, we discussed encryption and the Indian legal system around it. Next, we have got hashing. After all, hashing plays an intrinsic role in ‘authenticating’ electronic records, or in ‘securing’ electronic records. So, here’s our question for today- What is Hashing?
Cryptography has 3 basic elements – encoding, encrypting, and hashing. Hashing refers to the concept of taking any arbitrary amount of input data (any data – word document, audio file, video file, executable file, etc.) and applying the hashing algorithm to it. The algorithm generates a gibberish output data called the ‘hash’ or ‘hash value’. This hash value is also known as a message digest.
The algorithm used to produce the hash value is called a hash function. It is deterministic in nature. Basically, the same hash value is produced for the same input, each time the function is run. Both conventional and cryptographic hash functions are deterministic in nature.
There are two important things to note, however. One, hashing is a one-way function. It can be used on any input data to generate a hash value. But, applying a hash function to a hash value will not reveal the input data. Second, hashing always produces a fixed-length hash value, irrespective of the length/ size of the input.
A Few Examples
Input: My name is Ankita.
Input: My name is Ankita. I am writing this article for MyLawrd- Technology Law and Policy.
In the above examples, the hash is different for different inputs. Also, the input length varies, but the size of the hash value remains same.
What’s hashing used for?
The average user encounters hashing on a daily basis, albeit unknowingly. A service provider does not save our password, rather they save a hash value. And since hash functions are a one-way function, even if an attacker gets access to the hash value, he can’t run it through the hash function to get the plain text. So, if somebody does store passwords in plain text, they do not have a very convincing cyber security policy.
Pro Tip: If any service sends you your password if you click on the forgot password option, better stop using it.
So, the next time you log on to Instagram, do know that Instagram hashes the password you enter and compares this hash to the hash it has saved in its database. Only when the two hash values match, are you authorized to access your email (remember? Same input= same hash output?).
Okay. That was about service providers.
Do YOU need it?
Yes. Remember this all is about cyber security? While we use encryption to maintain the confidentiality of any data, we use hashing to check the integrity of the data. So, if your friend sent you an encrypted email, how do you know nobody in between fiddled with it? You use hash functions. Before sending the email, your smart friend computes the hash value and sends it to you. Once you get the email, you decrypt the email and compute the hash value. If both the hash values match, your data is safe!
A very common example is of downloading a file, especially in case of a software. The company which distributes the software also puts up the hash value of a trusted file. If the file you downloaded produces the same hash output, there’s no malware in it!
Okay. Another example. Last one. What if you have the responsibility to securely store a very large dataset? The smart thing to do would be to just compute the hash value and compare it with the hash value you compute before you use it next. Imagine keeping two copies of an excel sheet and comparing to check if the data is intact!
A Few Examples of Hash Functions
Now the above image might make you wonder, what’s SHA256sum? Well, there are a few hash algorithms. The common ones are:
MD5- 128 bits
SHA1- 160 bits
SHA-224- 224 bits
SHA-256- 256 bits
SHA-384- 384 bits
SHA-512- 512 bits
A hash value is often called as a checksum too. That’s why SHA256sum. You can compute the hash value of data using a software like ‘Hash My Files’ or ‘Hasher’. The software gives you an option to choose the hash function you want to run your data through.
Can’t Hackers Hack Hash Functions Too?
No. Hash functions are collision resistant. It is almost impossible to produce two identical hashes for two different messages. Google tried it with SHA-1 in 2017, and they achieved success. They called it ‘Shattered’. What did they need to achieve this feat? 110 GPUs, working around the clock for 1 year – and that was when they used a specially crafted algorithm. Here’s a scale of the computation:
- Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total
- 6,500 years of CPU computation to complete the attack first phase
- 110 years of GPU computation to complete the second phase
That’s it! 🙂 Be advised, we don’t use MD5 and SHA-1 anymore. They are susceptible to hash collisions. Cracking other hash functions is not computationally feasible at the moment. You may want to recall from your elementary education that a bit could either be a 0 or a 1. So, a SHA-256 has a computation of 2256. Any brute force attack would take hundreds of years with thousands of GPUs.
Geeking Out for Our Nerds
Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash.
Cryptographic hash functions are all that we discussed till now. Apart from being deterministic, one-way functions, collision-resistant cryptographic hash functions have some other properties too. They have an ‘avalanche effect’. A change in a single bit would change the entire hash value. See this example:
Input: My name is Ankita.
Input: My name is Ankita P.
Non cryptographic hash functions just try to avoid collisions for any non-malicious input and do not guarantee the other security properties of a cryptographic hash functions. Some just aim to detect accidental changes in data (Cyclical Redundancy Checks for example, which check for file system errors).
This is all for a sound understanding of the technique. In our next article, we would be discussing the legal mandate around hashing. Stay tuned!