Monday, November 30, 2009

Python: Adopting Perl's given and Smart Matching

Python doesn't have a switch statement. This makes it relatively unusual among modern languages, but it's not terribly shocking. It has never been entirely clear what the useful equivalent of C's very efficient and elegant switch should be in high-level languages. However, one useful signpost is Perl's late addition: given.

In Perl 5 (released around the dawn of Python) there was no switch equivalent. Many of the same hacks that are used in Python to work around this were suggested to Perl users. However, when crafting a spec for Perl 6, a switch statement was high on the list of user requests, so "given" was introduced. Later, as Perl 6 prototype implementation features were scrutinized for back-porting to Perl 5, given was selected as a useful bit of low-hanging fruit that didn't require massive changes to the language. In Perl 5.10, the given statement is now available with the use of a special pragma. Presumably, this pragma will be removed in future versions.

So, back to Python. Is given the right way to go? Perhaps. Given assumes a lower-level tool called smart-matching, and Python currently has no such mechanism. The introduction of smart-matching has the potential to be disruptive to the language if done poorly. Great care should therefore be taken, but a minimal approach should be acceptable.

(read on for the proposal...)

Here's one way to introduce smart-matching and given to Python:

The "ismatch" method is provided by object, and uses "==" to compare its "self" to its single parameter, "target" like so:

  def ismatch(self, target):
    return self == target 

However, classes are expected to provide useful ismatch methods of their own. For example, the ismatch method for the "re" class's compiled regular expressions objects should apply the regular expression to a target string or perform basic equality checking if the target is, itself, an re.

The given/when statement is thus a wrapper for invocations of ismatch which may or may not be optimized for trivial cases in any given implementation. Given takes a single parameter like if or for. It then requires a block of indented code which contains two types of statements: "when" and "default". The when statement also takes one parameter and introduces a block of indented code. The default statement is like "else" and takes no parameter.

Each when clause is tried, calling its object's ismatch method with the given object as its parameter. If any ismatch evaluates to True, then matching stops. Again, suitable optimizations may be performed in cases where all of the when operators have simple, constant values.

Here's an example:

  given name:
    when re.compile(r'^Ma?c[A-Z]'):
      print "Hey mac!"
    when "Jason Voorhees":
      print "It's OK, he just wanted his machete"
      print "Who are you?"

Perl defines given such that the when block is just another kind of statement, and neither must it occur only within a given block nor must it be the only kind of statement within such a block. I think that Python doesn't need or want this much flexibility in the syntax. It's not that it doesn't make sense in Perl, but Python simply isn't that forgiving a language. However, this presents some complexity when you wish to provide one block of code and multiple possible matches for it.

This is a very special case, since the body of both when statements is shared. Another unusual case is handled in C where one case statement falls through to a second one. In Python, each block is a discrete scope from which the default action is to exit the given, so instead of falling through, the next block must be explicitly invoked. Here's how that might look:
  when item_a:
    code ...

    continue when
  when item_b:
    code ...

By using "continue when" and not just a bare "continue" we avoid co-opting the flow control of an enclosing loop.

In Perl, magical variables are often populated with side-effect data. In Python this is frowned on, so a page can be taken out of the exception handling book to address the need for when clauses that have useful return values:

  given foo:
    when re.compile(r'^(a(b(c))?)'), match:
      print "Found",

Notice the comma with a variable after it. This is very similar to try/except, only instead of an exception, the named variable will store the return value of the smart-match. In the case of regular expressions, this is the match object or None, and since we only execute the block if the return value is true, we know that it's not None.

So why use "given"/"when" instead of C's "switch"/"case"? Certainly either can be used, but the motivation behind the naming in Perl was to indicate the kind of operation being performed. It's not simply the case that you're performing a jump into an indexed bit of code the way C does, but you're actually performing a set of operations with the given value as context. In this sense, the construct is more like Pascal's "with" statement than C's switch.

I know that the Perl and Python communities have not always relished sharing features because of the fundamentally different design philosophies, however, it seems to me that given and smart-matching represent a case where Perl has introduced a useful mechanic which Python can equally benefit from (just as Python's parameter passing semantics were adopted by Perl 6).