Cryptographic Hash Function

Cryptographic Hash Function Definition

Photo of formulas and pictures on a blackboard
© Yagi Studio / Digital Vision / Getty Images

A cryptographic hash function is a kind of algorithm that can be run on a piece of data, like an individual file or a password, to produce a value called a checksum.

The main use of a cryptographic hash function is to verify the authenticity of a piece of data. Two files can be assured to be identical only if the checksums generated from each file, using the same cryptographic hash function, are identical.

Some commonly used cryptographic hash functions include MD5 and SHA-1, though many others also exist.

Note: Cryptographic hash functions are often just referred to as hash functions for short, but that's not technically correct. A hash function is a more generic term that's usually used to encompass cryptographic hash functions along with other sorts of algorithms like cyclic redundancy checks.

Cryptographic Hash Functions: A Use Case

Let's say you download the latest version of the Firefox browser. For whatever reason, you needed to download it from a site other than Mozilla's. Not being hosted on a site you've learned to trust, you'd like to make sure that the installation file you just downloaded is the exact same thing Mozilla offers.

Using a checksum calculator, you compute a checksum using a particular cryptographic hash function (say SHA-2) and then compare that to the one published on Mozilla's site.

If they're equal, then you can be reasonably sure that the download you have is the one Mozilla intended you to have.

See What Is a Checksum? for more on these special calculators, plus more examples on using checksums to make sure files you download really are what you expected them to be.

Can Cryptographic Hash Functions Be Reversed?

Cryptographic hash functions are designed to prevent the ability to reverse the checksums they create back into the original texts.

However, even though they are virtually impossible to reverse, it doesn't mean they're 100% guaranteed to safeguard data.

Something called a rainbow table can be used to quickly figure out the plaintext of a checksum. Rainbow tables are basically dictionaries that list out thousands, millions, or even billions of these alongside their corresponding plaintext value.

While this isn't technically reversing the cryptographic hash algorithm, it might as well be since it's so simple to do. In reality, since no rainbow table can list out every possible checksum in existence, they're usually only "helpful" for simple phrases... like weak passwords.

Here's a simplified version of a rainbow table to show how one would work when using the SHA-1 cryptographic hash function:

PlaintextSHA-1 Checksum
123458cb2237d0679ca88db6464eac60da96345513964
password1e38ad214943daad1d64c102faec29de4afe9da3d
ilovemydoga25fb3505406c9ac761c8428692fbf5d5ddf1316
Jenny4007d5eb0173008fe55275d12e9629eef8bdb408c1f
dallas1984c1ebe6d80f4c7c087ad29d2c0dc3e059fc919da2

For these values to be figured out using the checksum, would require that the hacker understands which cryptographic hash algorithm was used to generate them.

For added protection, some websites that store user passwords perform additional functions to the cryptographic hash algorithm after the value is generated but before it's stored.

This produces a new value that only the web server understands and that doesn't exactly match the original checksum.

For example, after a password is entered and the checksum generated, it might be separated into several parts and rearranged before it's stored in the password database, or certain characters might be swapped with others. When the user attempts to authenticate the next time they sign on, this additional function would then be reversed by the web server and the original checksum generated again, to verify that a user's password is valid.

Doing this helps limit the usefulness of a hack where all the checksums are stolen.

Again, the idea here is to perform a function that is unknown so that if the hacker knows the cryptographic hash algorithm but not this custom one, then knowing the password checksums is unhelpful.

Passwords and Cryptographic Hash Functions

Similar to a rainbow table is how a database saves user passwords. When your password is entered, the checksum is generated and compared with the one on record with your username. You're then granted access if the two are identical.

Given that a cryptographic hash function produces a non-reversable checksum, does that mean you can make your password as simple as 12345, instead of 12@34$5, simply because the checksums themselves can't be understood? It definitely does not, and here's why...

As you can see, these two passwords are both impossible to decipher just by looking just at the checksum:

MD5 for 12345: 827ccb0eea8a706c4c34a16891f84e7b

MD5 for 12@34$5: a4d3cc004f487b18b2ccd4853053818b

So, at first glance you may think that it's absolutely fine to use either of these passwords. This is definitely true if an attacker tried figuring out your password by guessing the MD5 checksum (which nobody does), but not true if a brute force or dictionary attack is performed (which is a common tactic).

A brute force attack is when multiple random stabs are taken at guessing a password. In this case, it would be very easy to guess "12345," but pretty difficult to randomly figure out the other one. A dictionary attack is similar in that the attacker can try every word, number, or phrase from a list of common (and lesser commonly used) passwords, "12345" definitely being one that would be tried.

So, even though cryptographic hash functions produce difficult to impossible-to-guess checksums, you should still use a complex password for all your online and local user accounts.

Tip: See Examples of Weak and Strong Passwords if you're not sure whether yours is considered a strong password.

More Information on Cryptographic Hash Functions

It might seem like cryptographic hash functions are related to encryption but the two work in very different ways.

Encryption is a two way process where something is encrypted to become unreadable, but then decrypted later to be used normally again. You might encrypt files you've stored so that anyone who accesses them will be unable to use them, or you can utilize file transfer encryption to encrypt files that are moving over a network, like ones you upload or download online.

Like described above, cryptographic hash functions work differently in that the checksums are not meant to be reversed with a special de-hashing password like how encrypted files are read with a special decryption password. The only purpose cryptographic hash functions serve is to compare two pieces of data, like when downloading files, storing passwords, pulling data from a database, etc.

It's possible for a cryptographic hash function to produce the same checksum for different pieces of data. When this happens, it's called a collision. Clearly, this is a huge problem considering the entire point of a cryptographic hash function is to make entirely unique checksums for every data inputted into it.

The reasons collisions can occur is because each cryptographic hash function produces a value of a fixed length regardless of the input data. For example, the MD5 cryptographic hash function generates 827ccb0eea8a706c4c34a16891f84e7b, 1f633b2909b9c1addf32302c7a497983, and e10adc3949ba59abbe56e057f20f883e for three totally different blocks of data.

The first checksum is from 12345, the second was generated from over 700 letters and numbers, and the third is from 123456. All three inputs are of different lengths but the results are always just 32 characters long since MD5 was used.

As you can see, there is virtually no limit to the number of checksums that could be created since each tiny change in the input is supposed to produce a completely different checksum. However, because there is a limit to the number of checksums one cryptographic hash function can produce, there's always the possibility that you'll encounter a collision.

This is why other cryptographic hash functions have been created. While MD5 generates a 32-character value, SHA-1 generates 40 characters and SHA-2 (512) generates 128. The greater the number of characters that the checksum has, the less likely that a collision will occur because it provides more room for unique values.