Monday, June 1, 2009

The Power of Perl's Data::Dumper

Update: sorry about the code formatting. I don't appear to be able to get Blogger to play ball with me right now. Use your imagination, especially with respect to the python examples which simply won't work with the indentation shown.

I've been doing a fair amount of work in Python recently, and there are some things I really like about the language. However, being an old Perl programmer, I find myself desperately wanting a few features of Perl in Python, and one of them is Data::Dumper. This module lets you print out the contents of a variable as Perl code. Now, Python has some similar features such as pprint and pickle, but neither of them quite gives you what Data::Dumper does.

For example, here's a class definition:
  package Someclass;
sub new {
my($class, $param1, $param2) = @_;
return bless { param1=>$param1, param2=>$param2 }, $class;
}

When we dump out an instance of this class, we get:
  $VAR1 = bless( {
'param2' => 2,
'param1' => 1
}, 'Someclass' );

This is exactly the code you need in order to re-create the object (which you get from pickle), but it's a human-readable copy of all of the state contained within the object at the same time.

In Python, you might write:
  class Someclass(object):
def __init__(self,param1,param2):
self.param1=param1
self.param2=param2

But the pprint output just calls repr and you get:
  <__main__.someclass>

This is because Python relies on each object to provide its own serialization method, called __repr__. If you don't define it (and sadly, many don't define anything useful, here), there's no way to know just what it is that's going on under the hood other than by writing your own introspection code. You could, for example, treat the object as a dict and peruse its attributes:
 >>> pprint.pprint(x.__dict__)
{'param1': 1, 'param2': 2}

but there are limitations to such an approach, especially when it comes to encapsulation.

Anyway, the point is, this is one area in which Lisp, Perl and other languages that can represent arbitrary data as code have an ease-of-use advantage over languages that cannot. Hopefully this is being addressed in future versions of the language (I'm not using 3.x yet).