Why People Think MD5 Algorithm is badly broken !!

Today I’m going to continue our discussion on Hash Functions..but before I start are you one of the people who think MD5 is insecure without any clue what so ever just hearing from the people !! Okay so I’m going to explain why ? and hopefully after reading this post you will be able to tell why md5 is insecure.

I highly advice you to take a look at the previous post titled : “Hash Functions in Cryptography and How They Operate” . If you already know what  hash functions are just keep reading this post 😀 .. MD5 is a hash function that take arbitrary length of input and produce a 128-bit output it was designed in 1991 by Ron Rivest [1] md5 was believed to be collision resistant for many years but unfortunately this is gone now .. MD5 is really week against collision attacks in fact you can now find collisions in under a minute on normal PC, and if you want some easy tools to crack MD5 can search on google and you will find many websites that will preform some searching on rainbow tables (this is some tables that has many strings and their correspond digest for it to get the digest easily ) .. but the attacks against MD5 are collision attacks, not pre-image attacks. This means an attacker can produce two files with the same hash, if he has control over both of them. But he can’t match the hash of an existing file he didn’t influence.

but recent cryptanalysis beginning with Wang and Yu [2] have shown that it is actually possible to find collisions for the full MD5 using much fewer than 2^64 MD5 computations, There is one last concern that I have to worn you about .. don’t use salted MD5 to store passwords ever ( I saw many people do that) and simply because MD5 is so fast if there is an adversary who can guess billions of candidate passwords per second he will absolutely get your salted password !! .

I Just want to write about MD5 cause there are many people still using it and I’m strongly advice them to stop doing that.

[1] : https://en.wikipedia.org/wiki/MD5
[2] : http://www.infosec.sdu.edu.cn/uploadfile/papers/How%20to%20Break%20MD5%20and%20Other%20Hash%20Functions.pdf

Hash Functions in Cryptography and How They Operate

Today’s post about Cryptographic Hash Functions and how they role in cryptography, So to start our talk let’s define what is Hash Functions ?

Hash Functions are special kind of functions that take arbitrary length of input and generate a short fixed values, in other words mapping long input string to a short output string and sometimes called  a digest  for example if you put a string of letters like “Hello” it will generate specific output, in other words you can say that for every input there is an output that mapped to it ! .

So the questions now is : where could we use Hash Functions ?

Hash Functions are playing big role in cryptography and the typical use of hash functions is digital signatures, Given a message M you could sign it with the hash function like h(M) and the output will be unique for every message as a signature  (we will talk about this issue later on). However you could sign the message with a public-key operations but it will be expensive . and most importantly  hash functions doesn’t require a key to operate in other words the function will take one parameter and that is the message it self.

Typical output sizes are 128-1024-bits, There might be a limit on the input size, but let say for the sake of this argument that hash functions could take an arbitrary length as input.

Hash Functions are one way function, Given a message M is easier to compute the hash for it, but giving out the hash you could not  recompute the message back,  h(m) = x,  giving m you could compute x, giving x you can’t compute m back.

Probabilities of good Hash Functions :

There are many properties to describe a good hash function but I’ll talk about some of them :

1-  First property is the hash function should be fast, when you  give it an input you should  compute the hash quickly in a reasonable time, but not very quick because that will make it easy to break, but if it was so slow no one will bother to use it.

2- The function should go through  the whole file bit by bit and then generate the hash and if there is any bit or byte filliped in the middle or in any place in the file the hash should be completely different it’s something called  the Avalanche Effect if you are interested in the subject I suggest to look it up.

3- There is an important requirement for a  hash function and that is to be Collision Resistance, A collision means that two message M1, Mmapped to the same output, h(M1) = h(M2), of course for every function has it’s own collisions but even though they exist it shouldn’t be found, and the typical attack for hash functions is something called the birthday paradox, Collision Resistance is really important property we should talk about it in a separate post to dive in details . if you have two messages that mapped to the same output this will not be good because it will make anyone to forge the message, for example if I download a file from the internet I should have hash to verify that this is really what I download but what if an attacker can get a virus and manipulate it to generate the same hash of the good file so we will have two files having the same hash, in this case we will think that we have the chosen file but in fact we only got the virus.

Real hash functions : 

There are many hash functions out there but few of them can qualify a good hash functions At the moment , so pretty much you’re stuck with the existing algorithms like the SHA family : SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, you could use other algorithms also but my advice to you if the project was for your understanding use what ever you like, but if the project was something really important try to stick with the standard, just one quick note please don’t use MD5 !!! .. using MD5 is really bad choice it has so many attacks on it and collisions and even there is a lookup table for it and you can crack it by simply using Google..

 

I hope this post gave you a quick understanding of how Hash Functions works.

 

 

 

Crypto Ghost – File Encryption for Android OS

logo

Crypto Ghost is a File encryption application that run on Android platform, it’s only job is protecting your files from unauthorized access by encrypting your files , Crypto Ghost is using modern cryptography ( There is a paper published based on this project) and you can check the specifications of this application and how it’s operate in the official website, Crypto Ghost uses AES Algorithm in GCM mode with 256-bit key size ,Crypto Ghost is a free software no ads and no Internet connectivity required to run the app, The Encryption and Decryption process will run locally in the app without a help of a server Crypto Ghost provide a simple Interface so even non-technical people can use it.

App website : www.cryptoghost.com

App Documentation : cryptoghost.com/eng/documentation.html

App Paper : https://cryptoghost.com/eng/crypto_ghost_paper.pdf

Download from Google Play :

https://play.google.com/store/apps/details?id=net.almorabea.cryptoghost

Twitter: crypto_ghost

 

 

 

How Pseudo Random Number Generators Works

In any system that we develop we sometime need to generate random numbers or random values so we will use any random generator without asking if it’s really providing real randomness or not, And also when we deal with cryptography we need to use a random generator but this time we will check what if this function will generate a real randomness or not and we will examine the generated values to check that, Why we do that ?,  Because we are dealing with critical information and we really need some uniform data set to work with.

So what is Pseudo Random Number Generator ?

Pseudo Random Number Generators (PRGs/PRNGs)

A PRG is an efficient deterministic algorithm that expands a short, uniform seed into a longer pseudo random output.

And this is good whenever you have small number of true random bits, and you want to expand it and you want to have lots of “random looking” bits

Okay did you ask yourself from where we get this random values ?

generating random values it’s difficult for a computer to generate and that because computers are deterministic machines and can’t produce random values by it’s own so scientists found a way to collect random events like (mouse movements, CPU clock cycles, Hard drive heat , …) and so on.

And this data will be stored in something called the “pool” and this pool it consist of high entropy data, So when ever you want a random data you will be extracting values from the pool.

the second step will take this high entropy data and process it to yield a sequence of nearly independent and unbiased bits, the second step is necessary since high entropy data is not necessarily uniform.

I hope this post gave you a glance of how pseudo random number works and why they are important.

 

Why using non-Random IVs in CBC mode will count as vulnerability ?

Cipher Block Chaining “CBC”

IBM invented the Cipher Block Chaining (CBC) mode of operation in 1976, In CBC mode, each block of plaintext is XORed with the previous ciphertext block before being encrypted. This way, each ciphertext block depends on all plaintext blocks processed up to that point. To make each message unique, an initialization vector must be used in the first block.[1]

601px-CBC_encryption.svg

CBC_decryption

So if you encrypt 2 messages  starting with the same block with the same IV the resulting cipher will be the same so in this case will not be secure !

But why we can’t use a sequential  IV for example  ?? did you think about that ? the IV  will be different every time right? no  you guessed wrong let me demonstrate this issue for you

To demonstrate this let me give you an example about a questionnaire website so the application will have a lot of check boxes and this questionnaire will be sent to a third party organization and the data is already been encrypted so even though he is the admin of the website, he only has access to the cipher text.

In CBC, the IV is XORed (noted by “⊕” below) with the plain text, then run through the block cipher: C1 = Ek(IV ⊕ P1).

Since the Admin can access the cipher text and the IVs are predictable  he can choose questionnaire of his choice and apply the predicted IVs the admin’s  (IVadmin) and customer’s (IVcustomer),  he can choose the plain text for his own questionnaire  like this: Padmin = IVadmin ⊕ IVcustomer ⊕ “false”

The System encrypts this plain text like this:

Cadmin = Ek(IVadmin ⊕ Padmin) = Ek(IVadmin ⊕ (IVadmin ⊕ IVcustomer ⊕ “false”))

The IVadmin ⊕ IVadmin cancels out, which means that Cadmin = Ek(IVcustomer ⊕ “false”)

Now admin can compare Cadmin and Ccustomer. If they are different, he knows that the customer  must have entered “true” for that particular question.

But if the system used a Random IV every time this attack will be useless because the admin can’t predict the IV and can’t predict the answer

And you can see this point clearly in real example protocols like  WEP,  WEP  is vulnerable and now other protocols such as WPA2 is used.

I hope that this post was helpful for you.