Thursday, June 24, 2010

Collatz graphs and Perl 6

I wanted to play around a bit with numbers in Perl 6, so the Collatz Conjecture seemed like a good place to start.

The Collatz Conjecture is a simple statement which has some massive implications:

f(n) = if n is even n/2, otherwise (n x 3)+1
Given that n is a positive integer, repeated applications will eventually yield 1.
So, for example, n=10. On the first step, we get 5 because 10 is even, so we divide by two. On the second step, we get 16 because 5 is odd, so we multiply by 3 and add 1. Because 16 is a power of 2, we know that continuing to divide it by 2 will eventually give us 1, and all results until we reach 1 will be even so we're done. In fact, so far every number tested works this way but no one can, as yet, mathematically prove that there isn't an exception.

My code is fairly simple. It works backwards from 1. You can only get to 1 from 2 because there's no odd, positive integer (0 doesn't count, here) that when multiplied by 3 and incremented by 1, is 1. So then we work on 2, which can only be arrived at from 4. 4 can be arrived at from 1, but 1 is the end of the process, so we ignore that. It can also be arrived at from 8. 8 Can be arrived at from 16, but as we know from above 16 can be arrived at from 5 as well as 32, so that's our first branch in the tree. This process can be repeated for as long as you have enough computer memory to store all of the current branch-points.

That's exactly what I do, and I configured it so that you simply input a depth of tree you want to output and it outputs every value in a tree of that depth.

If you would like to see the code, check it out on github. The output svg file is also in the same directory for the tree-like output and the radial output from the twopi filter, along with the Graphviz source. I generated the following graph with the help of my good friends Graphviz and rsvg. The whole depth-20 graph looks like this:
But, of course, it's hard to make anything out there, so here's a close-up:
There are some interesting things that jump out at me in this. Because you can only ever get to a multiple of 3 from its double (since n-1 is never divisible by 3 if n is), once you hit any number that's divisible by 3, you embark on a straight line, like the one you see above at 213. The other thing that becomes obvious is that even fairly low numbers (like 19) don't appear until very late in the process. That there's no duplication in this graph is really fascinating.

The most interesting part of the Collatz Conjecture, I think, is that it can be expressed so easily that anyone with a rudimentary amount of math skills can understand it. This makes it accessible to a wider audience than any other unsolved problem in math that I can think of, and yet it's never been solved. Interesting stuff.

Anyway, enjoy your own explorations of the Collatz Conjecture and don't get too engrossed, or you might find that you're a bit too much like this XKCD comic: http://xkcd.com/710/

Thursday, June 10, 2010

Are your passwords safe in MD5 or SHA-1 formats?

I've read, over and over again, various questions and seemingly authoritative statements about the security of various hashing algorithms. I've gotten kind of tired of reading misinformation, so here's some detail that you can trust.

  • US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use."
  • The document that this was stated in is titled, "Vulnerability Note VU#836068: MD5 vulnerable to collision attacks"
  • A collision attack is where an attacker, given access to a hashed password (or other plain text), crafts a password that yields the same result when hashed. Thus to a password authentication system, the crafted "collision" seems to be the correct password.
  • If your password hashing scheme does not use a salt, none of this is interesting to you, as you have little or no security to speak of given an attacker (internal or external) who has access to your hashed passwords.

OK, so what does any of this mean to you? Is MD5 secure? Well, not really. It is possible, with moderate hardware investment and access to the hashed password to generate a "skeleton key." No one can "crack" the original password in a reasonable amount of time that I know of or that I've read about, but access to an equivalent password solves many problems for an attacker, even if they can't then take that password and use it against other services (since those services would not be using the same "salt" which prevents the same password hashing the same way on two different sites or services).

The question you have to ask yourself is this: why are you hashing passwords? Is it to protect them, should someone gain access to your systems from the outside? Is it to protect them from those who have access to the data store? In these cases, md5 is at best a weak protection, but it is significantly better than some of the alternatives (DES, etc.) which are breakable in practically no time.

But MD5 is used in many places besides password hashing. Should we stop using it there? Probably not.

For example many backup and data validation tools use MD5 to make sure that data has not been modified (either to initiate a backup/copy or to safeguard against accidental local change). These purposes are still served just as well now as they were when MD5 was introduced, and the fact that MD5 has been proven to have possible collision attacks does not really impact the data integrity aspect of the algorithm. Of course, there are cases where MD5 will identify a block as unchanged when it has, in fact, changed. This is true for all hashing algorithms, but the reason that MD5 was initially considered acceptable for this purpose was that the chance of that happening without malicious intent is astronomically small (that malicious intent was not believed to be as much of a factor then is not interesting to us, now). It would be a bit like dropping a penny down into one of those boxes with water where the goal is to land it on a small platform, and just as you dropped it, an earthquake struck, causing the penny to bounce off the platform, jump back up through the slot and blind you. Just as I don't recommend avoiding such games because of the risk of blindness, I don't think you need to stay away from hashing algorithms (including MD5) in order to avoid missing a data update. If you think someone might be waiting for you to drop the penny so they can set off some dynamite, then you have a different kind of problem, and MD5 might not be the best choice (e.g. if you're performing MD5 checksums in order to verify that a system's software has not been compromised).

Now, that changes as your risk profile changes. There are times, I believe, where it makes sense to take extra precautions. For example, if you're making constant backups of large amounts of rapidly changing data whose integrity in original and backup form has a high risk associated (e.g. medical data), then I might use two hashing algorithms to perform the verification. MD5 might be a fine choice for one of them, but I'd use SHA-1 or something similar on top of it. It's still astoundingly unlikely to be an issue, but there's a time an place for being stupidly extra-certain and if you can afford the extra CPU cycles, why not compute two hashes while you're looking at the data?

What about SHA-1? Hasn't that been broken too? No, SHA-1 has known weaknesses which will likely yield security-impacting attacks in the future, but as of now, these weaknesses have yet to be translated into actual attack vectors. It's certainly worth staying on top of, and keeping a flexible hashing scheme (ala the OpenBSD/LDAP schema) in your application in order to upgrade to SHA-3 when it becomes available and has been thoroughly tested, but for now SHA-1 is an excellent choice for anything short of military/state-secret sorts of crypto-hashing needs.

Bruce Schneier, who is recognized around the world as an authority on cryptographic security, had this to say about the news regarding SHA-1:
They can find collisions in SHA-1 in 269 calculations, about 2,000 times faster than brute force. Right now, that is just on the far edge of feasibility with current technology.
Jon Callas, PGP's CTO, put it best: "It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off." That's basically what I said last August.
It's time for us all to migrate away from SHA-1.
Most of the hash functions we have, and all the ones in widespread use, are based on the general principles of MD4. Clearly we've learned a lot about hash functions in the past decade, and I think we can start applying that knowledge to create something even more secure.
Hash functions are the least-well-understood cryptographic primitive, and hashing techniques are much less developed than encryption techniques. Regularly there are surprising cryptographic results in hashing ... we still have a lot to learn about hashing.

Wednesday, June 9, 2010

Perl 6, Python, hyperoperators and list comprehensions

I use Python every day at work, and I do like the language. There are things about it that annoy me, but I don't think that's ever not been true of any language I've used. One of Python's best features is its list comprehensions. These short snippets of code can embody so much work that it often feels like Python is writing your code for you.

At night, I go home and work on Perl 6, the upcoming update to the decades-old programming language which adds features from nearly every programming language you've ever heard of (and some you haven't). The direct equivalent of the list comprehension in Perl 6 is the same as it was in Perl 5: map. Here's how you use map in Perl 6:

 map {$_ + 10}, (1,2,3,4)

This yields the same list as the Python:

 [ x + 10 for x in 1,2,3,4 ]

Ah, but the astute among you are noticing that the Perl uses a variable name that's always the same, thus making nested map statements painful due to the need to create temporary variable names manually. In Perl 5, this was true, but we can name those temporaries quite easily now:

 map { $^x + 10 }, (1,2,3,4)

Perl sees these temporaries from left to right and considers them positional parameters in the order that they appear to the current block (which is also a closure).

But Perl 6 gives us something more than map. In fact, map will be used much less frequently in Perl 6 because of hyperoperators. A hyperoperator is an operator that takes another operator as a parameter and augments its behavior. In Perl 6, hyperoperators can do this:

 (1,2,3,4) <<+>> 10


or the Unicode equivalent:


 (1,2,3,4) «+» 10

This takes the + operator and makes it work on the list given on the right side, adding the value on the left to each item and returning the newly created list of results. So, we never need to create a closure in order to ask Perl to do some particular binary operation on all elements of a list. Instead, we just pass the list, the operator and the right hand side to a hyperoperator and it does all the heavy lifting. We don't even need to see the temporary variable that's being used.

List comprehensions like Python's [ ... for ... ] and Perl's map are extremely valuable things for doing complex operations, but when what you really want is just to perform a simple operation on the elements of a list, hyperoperators give you what you need without the trappings you don't care about.

Note: There are actually multiple forms of hyperoperator depending on how "DWIMy" (Do What I Mean) you want it to be and on which of its arguments. See Synopsis 3's section on Hyperoperators for more.

Thursday, June 3, 2010

5 things you can do with Lists in Perl 6, Python and Ruby

I think practical examples of doing the same sorts of tasks in different programming languages can be wonderful tools. Recently, an IT student in Poland named Konrad posted a followup on his blog to the 2007 Ruby blog, "5 things you can do with a Ruby array in one line (PLUS A FREE BONUS!!)" by drewolson. He updated this for Python. Of course, having worked with Perl 6 quite a lot recently (see my Google Buzz posts titled "Your daily dose of Perl 6"), I was compelled to do the same for that language. See below for the results. Notice that Konrad chose temporary variable names that were much shorter than drewolson's, so the visual comparison between Ruby and Pyhthon in terms of code size is somewhat unfair, but I'll go with the original names where I need temporaries, just to be fair to Ruby.

Summing elements


Here, the original Ruby example printed the result, but I've trimmed that out for consistency with the rest of the examples.


Ruby:
  my_array.inject(0){|sum,item| sum + item}
Python:
  sum(my_list)
Perl 6:
  [+] @my_array

Double every item

Ruby:
  my_array.map{|item| item*2 }
Python:
  [2 * x for x in my_list]
Perl 6:
  @my_array <<*>> 2

Finding all items that meet your criteria (such as being divisible by 3)

Ruby:
  my_array.find_all{|item| item % 3 == 0 }
Python:
  [x for x in my_list if x % 3 == 0]
Perl 6:
  grep {$^item %% 3}, @my_array
  # or...
  grep * %% 3, @my_array

Combine techniques

Ruby:
  my_array.find_all{|item| item % 3 == 0 }.inject(0){|sum,item| sum + item }
Python:
  sum(x for x in my_list if x % 3 == 0)
Perl 6:
  [+] grep * %% 3, @my_array

Sorting

For more information on how Perl 6 sorting works and what the *-autoclosure syntax that I've used above does, see carl's excellent Perl 6 Advent Calendar post.


Ruby:
  my_array.sort
  my_array.sort_by{|item| item*-1}
Python:
  sorted(my_list)
  sorted(my_list, reverse=True)
Perl 6:
  @my_array.sort;
  @my_array.sort: -*;

And there you have it. Enjoy the many choices you have in programming languages!