I've read, over and over again, various questions and seemingly authoritative statements about the security of various hashing algorithms. I've gotten kind of tired of reading misinformation, so here's some detail that you can trust.
- US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use."
- The document that this was stated in is titled, "Vulnerability Note VU#836068: MD5 vulnerable to collision attacks"
- A collision attack is where an attacker, given access to a hashed password (or other plain text), crafts a password that yields the same result when hashed. Thus to a password authentication system, the crafted "collision" seems to be the correct password.
- If your password hashing scheme does not use a salt, none of this is interesting to you, as you have little or no security to speak of given an attacker (internal or external) who has access to your hashed passwords.
OK, so what does any of this mean to you?
Is MD5 secure? Well, not really. It is possible, with moderate hardware investment and access to the hashed password to generate a "skeleton key." No one can "crack" the original password in a reasonable amount of time that I know of or that I've read about, but access to an equivalent password solves many problems for an attacker, even if they can't then take that password and use it against other services (since those services would not be using the same "salt" which prevents the same password hashing the same way on two different sites or services).
The question you have to ask yourself is this: why are you hashing passwords? Is it to protect them, should someone gain access to your systems from the outside? Is it to protect them from those who have access to the data store? In these cases, md5 is at best a weak protection, but it is significantly better than some of the alternatives (DES, etc.) which are breakable in practically no time.
But MD5 is used in many places besides password hashing. Should we stop using it there? Probably not.
For example many backup and data validation tools use MD5 to make sure that data has not been modified (either to initiate a backup/copy or to safeguard against accidental local change). These purposes are still served just as well now as they were when MD5 was introduced, and the fact that MD5 has been proven to have possible collision attacks does not really impact the data integrity aspect of the algorithm. Of course, there are cases where MD5 will identify a block as unchanged when it has, in fact, changed. This is true for all hashing algorithms, but the reason that MD5 was initially considered acceptable for this purpose was that the chance of that happening without malicious intent is astronomically small (that malicious intent was not believed to be as much of a factor then is not interesting to us, now). It would be a bit like dropping a penny down into one of those boxes with water where the goal is to land it on a small platform, and just as you dropped it, an earthquake struck, causing the penny to bounce off the platform, jump back up through the slot and blind you. Just as I don't recommend avoiding such games because of the risk of blindness, I don't think you need to stay away from hashing algorithms (including MD5) in order to avoid missing a data update. If you think someone might be waiting for you to drop the penny so they can set off some dynamite, then you have a different kind of problem, and MD5 might not be the best choice (e.g. if you're performing MD5 checksums in order to verify that a system's software has not been compromised).
Now, that changes as your risk profile changes. There are times, I believe, where it makes sense to take extra precautions. For example, if you're making constant backups of large amounts of rapidly changing data whose integrity in original and backup form has a high risk associated (e.g. medical data), then I might use two hashing algorithms to perform the verification. MD5 might be a fine choice for one of them, but I'd use SHA-1 or something similar on top of it. It's still astoundingly unlikely to be an issue, but there's a time an place for being stupidly extra-certain and if you can afford the extra CPU cycles, why not compute two hashes while you're looking at the data?
What about SHA-1? Hasn't that been broken too? No,
SHA-1 has known weaknesses which will likely yield security-impacting attacks in the future, but as of now, these weaknesses have yet to be translated into actual attack vectors. It's certainly worth staying on top of, and keeping a flexible hashing scheme (ala
the OpenBSD/LDAP schema) in your application in order to upgrade to SHA-3 when it becomes available and has been thoroughly tested, but for now SHA-1 is an excellent choice for anything short of military/state-secret sorts of crypto-hashing needs.
Bruce Schneier, who is recognized around the world as an authority on cryptographic security,
had this to say about the news regarding SHA-1:
They can find collisions in SHA-1 in 269 calculations, about 2,000 times faster than brute force. Right now, that is just on the far edge of feasibility with current technology.
Jon Callas, PGP's CTO, put it best: "It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off." That's basically what I said last August.
It's time for us all to migrate away from SHA-1.
Most of the hash functions we have, and all the ones in widespread use, are based on the general principles of MD4. Clearly we've learned a lot about hash functions in the past decade, and I think we can start applying that knowledge to create something even more secure.
Hash functions are the least-well-understood cryptographic primitive, and hashing techniques are much less developed than encryption techniques. Regularly there are surprising cryptographic results in hashing ... we still have a lot to learn about hashing.