Tuesday, October 12, 2010

A Google App Engine failure

Long ago, I wrote a Perl script that generates random names for my roleplaying games. It's a simple thing, but it can take input lists from any language and spit out similar-sounding made-up names. It's a powerful, but simple tool, and it seemed a natural fit for my first exploration of Google App Engine. Sadly, it didn't work out that way, and I thought it might serve as a useful caution to others who might plan the same sort of work.

The fundamental problem is that my app is IO-hungry. It reads in the entire source list every time someone asks for a made-up word, crunches it down into first-parts, mid-parts and end-parts (2-3 letter segments which are rooted at the beginning or end of the word or neither). We then sort the lists of parts according to frequency of occurrence and perform a weighted, random pick of a first part, then each subsequent part is chosen in the same way, but from a subset of all of the parts, which overlaps the previous segment. The combination of weighted choice and overlapping leads to words which tend to be pronounceable in the source language of the input list.

This process of reading and processing all of the words every time wasn't something I was going to be able to do in Google App Engine, however, since costs are associated with resources consumption. So, I set out to store the pre-digested versions of the input lists as sorted word-segments in the Google App Engine datastore. This is where my problems began. While it's entirely possible to store the data this way, what I found was that my need to access so many records from the database as I performed my random walk down the lists of word-parts left GAA gasping for breath. In practical terms, I'd created the world's slowest tool for producing babble. Of this, I'm sure my mother feels proud.

Frankly, I'm not sure what I can do about this. GAA just doesn't seem to have been designed for this sort of thing. A shame, really. Of course, I could pre-compute a queue of results for each source namelist and keep re-populating them with a periodic job, but that really seems like a cheesy way to solve a problem that takes a few seconds for my original Perl script.