Fifty State Project Activity

There is something a little bit dirty about screen scraping. In this modern day and age, isn’t there some sort of RESTful JSON based API for all of the important data? Well, no. But the guys at the Fifty State Project are trying their best to change that for as much of the state legislation that they can get.

I spent some time over the last three weeks working on the legislation scraper for the state of Montana. I picked up where another programmer left off, and most of the heavy lifting was done in a few days. James really laid down a good foundation, and I was glad that he was happy to see the work continued, even if it wasn’t by him. The bulk of my time was spent working out all of the little special cases that crop up in systems like these. For instance, since the data is stored in a directory on a web server, folks could put other stuff out there in addition to legislation, like letters from the governor. I didn’t think to put a “governor_letter_check” method in my code until I had scraped 90% of all of the available bills and run in to this one rogue case.

The work isn’t done yet. Special sessions aren’t yet parsed because they seem to use an entirely different system, and some meta data about bills that are only published in PDF format get missed. I’m taking a little bit of a break while the guys bang out the ‘newapi’ branch of the project, but I hope to get back at it soon.