What is hashing: For Dummies

In our last technology explainer, we discussed encryption and the Indian legal system around it. Next, we have got hashing. After all, hashing plays an intrinsic role in ‘authenticating’ electronic records, or in ‘securing’ electronic records. So, here’s our question for today- What is Hashing?

Cryptography has 3 basic elements – encoding, encrypting, and hashing. Hashing refers to the concept of taking any arbitrary amount of input data (any data – word document, audio file, video file, executable file, etc.) and applying the hashing algorithm to it. The algorithm generates a gibberish output data called the ‘hash’ or ‘hash value’. This hash value is also known as a message digest.

The algorithm used to produce the hash value is called a hash function. It is deterministic in nature. Basically, the same hash value is produced for the same input, each time the function is run. Both conventional and cryptographic hash functions are deterministic in nature. 

There are two important things to note, however. One, hashing is a one-way function. It can be used on any input data to generate a hash value. But, applying a hash function to a hash value will not reveal the input data. Second, hashing always produces a fixed-length hash value, irrespective of the length/ size of the input. 

A Few Examples

Input: My name is Ankita.

Hash: 6795D462DE738ED46BD0323B951651735A327007

Input: My name is Ankita. I am writing this article for MyLawrd- Technology Law and Policy.

Hash: 9525DGF5654DG54JF2GER6RD4XZ7V465Y6HI1J

In the above examples, the hash is different for different inputs. Also, the input length varies, but the size of the hash value remains same.

What’s hashing used for?

The average user encounters hashing on a daily basis, albeit unknowingly. A service provider does not save our password, rather they save a hash value. And since hash functions are a one-way function, even if an attacker gets access to the hash value, he can’t run it through the hash function to get the plain text. So, if somebody does store passwords in plain text, they do not have a very convincing cyber security policy.  

Pro Tip: If any service sends you your password if you click on the forgot password option, better stop using it.  

So, the next time you log on to Instagram, do know that Instagram hashes the password you enter and compares this hash to the hash it has saved in its database. Only when the two hash values match, are you authorized to access your email (remember? Same input= same hash output?).

Okay. That was about service providers.

Do YOU need it?

Yes. Remember this all is about cyber security? While we use encryption to maintain the confidentiality of any data, we use hashing to check the integrity of the data. So, if your friend sent you an encrypted email, how do you know nobody in between fiddled with it? You use hash functions. Before sending the email, your smart friend computes the hash value and sends it to you. Once you get the email, you decrypt the email and compute the hash value. If both the hash values match, your data is safe! 

A very common example is of downloading a file, especially in case of a software. The company which distributes the software also puts up the hash value of a trusted file. If the file you downloaded produces the same hash output, there’s no malware in it!

Okay. Another example. Last one. What if you have the responsibility to securely store a very large dataset? The smart thing to do would be to just compute the hash value and compare it with the hash value you compute before you use it next. Imagine keeping two copies of an excel sheet and comparing to check if the data is intact!

A Few Examples of Hash Functions

Now the above image might make you wonder, what’s SHA256sum? Well, there are a few hash algorithms. The common ones are:

MD5- 128 bits

SHA1- 160 bits

SHA-224- 224 bits

SHA-256- 256 bits

SHA-384- 384 bits

SHA-512- 512 bits

A hash value is often called as a checksum too. That’s why SHA256sum. You can compute the hash value of data using a software like ‘Hash My Files’ or ‘Hasher’. The software gives you an option to choose the hash function you want to run your data through.

Can’t Hackers Hack Hash Functions Too?

No. Hash functions are collision resistant. It is almost impossible to produce two identical hashes for two different messages. Google tried it with SHA-1 in 2017, and they achieved success.  They called it ‘Shattered’. What did they need to achieve this feat? 110 GPUs, working around the clock for 1 year – and that was when they used a specially crafted algorithm. Here’s a scale of the computation: 

  • Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total

  • 6,500 years of CPU computation to complete the attack first phase

  • 110 years of GPU computation to complete the second phase

That’s it! 🙂 Be advised, we don’t use MD5 and SHA-1 anymore. They are susceptible to hash collisions. Cracking other hash functions is not computationally feasible at the moment. You may want to recall from your elementary education that a bit could either be a 0 or a 1. So, a SHA-256 has a computation of 2256. Any brute force attack would take hundreds of years with thousands of GPUs.

Geeking Out for Our Nerds

Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash. 

Cryptographic hash functions are all that we discussed till now. Apart from being deterministic, one-way functions, collision-resistant cryptographic hash functions have some other properties too. They have an ‘avalanche effect’. A change in a single bit would change the entire hash value. See this example: 

Input: My name is Ankita. 

Hash: 6795D462DE738ED46BD0323B951651735A327007 

Input: My name is Ankita P. 

Hash: 26365DTG385H7G6HD8G76D1H6FH54SG6V3Z8H 

Non cryptographic hash functions just try to avoid collisions for any non-malicious input and do not guarantee the other security properties of a cryptographic hash functions. Some just aim to detect accidental changes in data (Cyclical Redundancy Checks for example, which check for file system errors). 

This is all for a sound understanding of the technique. In our next article, we would be discussing the legal mandate around hashing. Stay tuned!

Do subscribe to our Telegram channel for more resources and discussions on tech-law. To receive weekly updates, and a massive monthly roundup, don’t forget to subscribe to our Newsletter.

You can also follow us on InstagramFacebookLinkedIn, and Twitter for frequent updates and news flashes about #technologylaw.

Ankita Parida

Ankita is a lawyer and a software engineer from IIIT Bhubaneswar. She is a technology enthusiast and is dedicated to exploring legal angles of technological progress and use. Her areas of interest include Technology law, IPR, and Commercial Contracts.