Sunday, February 21, 2010

Perl and Python documentation or Pydoc considered harmful

If you've read my programming posts before you know that I've transitioned from being mostly a Perl programmer to mostly a Python programmer (and from C to Perl before that). For the most part, this has been a painless transition. Python tends to take itself too seriously (which leaves me thinking: odd choice of name), but other than that it's a good language. Perl has its strengths, to be sure, but so does Python. One area, however, that causes me no end of frustration is reading Python module and program documentation. Coming from Perl, as I do, I'm used to flowing, descriptive documentation, the crafting of which is as much a part of the authoring of a piece of public code as the source.

In the Python world, it's another story. There are projects with excellent documentation like django, but I've never seen one that used the built-in documentation system in Python that was worth the time it took to create. (more...)



In Perl, the WWW::Mechanize module is used for Web scraping, automation and testing. Its documentation is nearly perfect, walking the user through its use by example and as a reference document at the same time.

In Python, I just came across scrape.py which is meant as a replacement for Beautiful Soup. Beautiful Soup's documentation is excellent, though not as good a reference document as WWW::Mechanize's, it's still a very good document for the primary purpose: explaining how to use the library. scrape.py's documentation, on the other hand is terrible. It documents everything with equal weight and has no flow at all. What's the difference between Beautiful Soup and scrape.py? scrape uses Python's built-in documentation system (source code download link) while Beautiful Soup uses HTML.

So, do Perl's POD documentation and HTML have some hidden advantage over Pydoc or is this just a cherry-picked comparison of unequal documentation styles? In my experience, thus far, it's the former. Python documentation comes in two flavors: pydoc and useful. I've yet to see the exception, though I'm sure someone, somewhere has used Pydoc to create some beautiful and useful documentation (just as someone used staples to create beautiful art on a wall).

What makes Pydoc so bad? A few things. First of all, it's not free-form at all. It's a tool for documenting code for maintenance programmers to read (not shocking, given the motivation behind writing Python in the first place). This means that the author never has to stop to transition from "coding mode" to "documentation mode" and documentation ends up reading like source. Frankly, if I wanted to read source code, I would. What I really need when I'm going to the docs is for the programmer to put the code down, take a deep breath, and think about what their software means to the user. Perhaps I'm just a bad programmer, but I can't do that while I'm coding.

OK, so I'm advocating for going back and writing the documentation once the code is done. I can do that in Pydoc, right? Well... yes, but then you get to the second problem: flow. Read WWW::Mechanize's documentation. It begins with some examples, then gets into how to instantiate the core class using new() then touches on a related startup method and then talks about a stand-alone helper function that you can use in conjunction with that method. This is documenting the use of the library, not the use of a specific class. The code might be organized very differently, but the documentation is laid out as it makes sense for the audience, not the source.

In Pydoc, documentation is data that lives in the code. It is fundamentally tied to the source in a way that prevents you from making any structural changes to the resulting document without also making those changes to the source. This means that you have to risk damaging the functionality of the program (and, of course, damage much of the source control history of the source) in order to change the flow of the documentation. That results in a high degree of documentation inertia.

You also cannot easily transition from reference style to descriptive style of documentation. If, for example, you wanted to have a section of a security module that discussed the statistics behind its choices, you might want that to be introduced after the general documentation about the module's use and its primary classes or functions. However, after that, you might then delve into deeper topics like helper functions/methods and other topics that are important, but certainly less weighty than the first two sections. In Pydoc you cannot do this. Documentation is sectioned in a way that the programmer cannot control.

Pydoc has one thing going for it: everything ends up in the documentation because source code auto-generates the framework for the documentation. This is a good thing, and I'd like to see a way to bring the two styles together, but as it stands I always dread having to look at the documentation for anything in Python unless I already know that it's been done using an external tool. That really should not be the case.

On a side note, here's a fun trick to try. To get documentation on a specific builtin function in Perl, do this:
perldoc -f sort
Try the equivalent in Python:
pydoc -k sorted
I'm sure I'm missing some trick, but on my system the latter results in a traceback which kindly explains that "'NoneType' object has no attribute 'get'" which, while technically true, is just about the worst excuse for user feedback I've ever seen. I'll probably need to write a followup article at some point on why tracebacks are a destructive and rainforest clear-cutting.