Today’s post about Cryptographic Hash Functions and how they role in cryptography, So to start our talk let’s define what is Hash Functions ?
Hash Functions are special kind of functions that take arbitrary length of input and generate a short fixed values, in other words mapping long input string to a short output string and sometimes called a digest for example if you put a string of letters like “Hello” it will generate specific output, in other words you can say that for every input there is an output that mapped to it ! .
So the questions now is : where could we use Hash Functions ?
Hash Functions are playing big role in cryptography and the typical use of hash functions is digital signatures, Given a message M you could sign it with the hash function like h(M) and the output will be unique for every message as a signature (we will talk about this issue later on). However you could sign the message with a public-key operations but it will be expensive . and most importantly hash functions doesn’t require a key to operate in other words the function will take one parameter and that is the message it self.
Typical output sizes are 128-1024-bits, There might be a limit on the input size, but let say for the sake of this argument that hash functions could take an arbitrary length as input.
Hash Functions are one way function, Given a message M is easier to compute the hash for it, but giving out the hash you could not recompute the message back, h(m) = x, giving m you could compute x, giving x you can’t compute m back.
Probabilities of good Hash Functions :
There are many properties to describe a good hash function but I’ll talk about some of them :
1- First property is the hash function should be fast, when you give it an input you should compute the hash quickly in a reasonable time, but not very quick because that will make it easy to break, but if it was so slow no one will bother to use it.
2- The function should go through the whole file bit by bit and then generate the hash and if there is any bit or byte filliped in the middle or in any place in the file the hash should be completely different it’s something called the Avalanche Effect if you are interested in the subject I suggest to look it up.
3- There is an important requirement for a hash function and that is to be Collision Resistance, A collision means that two message M1, M2 mapped to the same output, h(M1) = h(M2), of course for every function has it’s own collisions but even though they exist it shouldn’t be found, and the typical attack for hash functions is something called the birthday paradox, Collision Resistance is really important property we should talk about it in a separate post to dive in details . if you have two messages that mapped to the same output this will not be good because it will make anyone to forge the message, for example if I download a file from the internet I should have hash to verify that this is really what I download but what if an attacker can get a virus and manipulate it to generate the same hash of the good file so we will have two files having the same hash, in this case we will think that we have the chosen file but in fact we only got the virus.
Real hash functions :
There are many hash functions out there but few of them can qualify a good hash functions At the moment , so pretty much you’re stuck with the existing algorithms like the SHA family : SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, you could use other algorithms also but my advice to you if the project was for your understanding use what ever you like, but if the project was something really important try to stick with the standard, just one quick note please don’t use MD5 !!! .. using MD5 is really bad choice it has so many attacks on it and collisions and even there is a lookup table for it and you can crack it by simply using Google..
I hope this post gave you a quick understanding of how Hash Functions works.