The (Nearly) Immediate Gratification of Playing with Geospatial Data

November 15, 2011

Makers and Coders McGill (MC²) is one of the new Digital Humanities initiatives that we’ve started this year. It’s a complement to the the Digital Humanities Reading Group, which is best thought of as a book club (or more accurately article and blog club) for DH enthusiasts and the DH-curious (a term that has resonated a lot in the group). Whereas the DH Reading Group is about reading and discussion, MC² is much more about doing stuff, running the gamut from coding to fabrication. Attendance in both groups has been strong and the diversity of perspectives (traditional humanities, libraries, social sciences, music, etc.) has been very stimulating.

During the last MC² meeting we agreed that we would experiment with data aggregated by Montréal Ouvert, an initiative to promote open access to a range of municipal data from Montreal (similar open data initiatives exist in Canada and elsewhere). My usual research doesn’t much involved the use geographical data and I was keen to get my hands dirty and learn some new stuff; did I ever. I knew there was a variety of APIs and web services that would help us, though I hadn’t anticipated how quickly we would be able to create and play with a map, especially given data that wasn’t especially intended for mapping purposes. Much of the geo-coding magic was accomplished by BatchGeo, a service suggested by fellow MC² participant Renee Sieber.

The first step was to consult a list of data sources from the city of Montreal, compiled on the Actions page of Montréal Ouvert. The formats here vary, but it seemed preferable to start with something in XML – I was tempted by the bike-sharing Bixi data, but those were already geo-encoded (with latitude and longitude values), and I wanted more of a challenge. I opted for the “Health Inspection Infractions” from 2010, thinking it might be interesting to see what neighbourhoods had the most restaurants and other establishments that had been fined (for this quick and dirty experiment I didn’t correlate population, total number of restaurants, income levels or any other data that would probably be relevant if I were doing anything more than a quick experiment).

Once the source was chosen, I downloaded the XML file. I knew that BatchGeo required tabular data as presented in a spreadsheet, and since the XML file had a simple structure, I could do a quick search for a free, web-based XML to CSV converter like the one at Luxon Software. I uploaded the XML file to this service, and presto! downloaded at CSV file. I could now import this file into a spreadsheet program like Google Spreadsheets.

On the left is the source XML document (as viewed in a browser) and on the right is the converted spreadsheet data imported from comma separated values (as viewed in Google Spreadsheets).

Once in the spreadsheet, I could select on the content, copy it into the clipboard, and then paste it into the box in BatchGeo. Like magic, the table redraws itself with nicely formatted data. I still needed to set the relevant options, including to specify columns for the city, address, theme, and title. The final step was to hit the “Make Google Map” button and watch as the geo-coding was performed (BatchGeo was assigning longitude and latitude numbers based on the addresses provided). After about a minute, ding! there was a fully baked map:

The map generated by BatchGeo using the Health Infractions data in the spreadsheet. Click on the image to view the live map.

So, XML to CSV, CSV to BatchGeo to add geo-location data, and there we have a map. An amazing transformation from static XML data to an interactive map. Yes, it’s simplistic, but that’s the point: you can easily create and play with maps.

Given how unexpectedly quickly this went, I started looking for some alternatives. Again Renee introduced me to something new, the ability to import XML into a Google Spreadsheet, using a custom XPath query to define the values of each column. My first attempt at this actually failed, I suspect because the source XML document is wrongly declared by the server to be HTML, not XML. A quick posting of the XML file on my server allowed me to continue. Such are the joys of working with data in the wild – one must often use duct tape to make things work properly – in fact, I think that recognizing little problems along the way and figuring out how to resolve them is the essence of digital humanities – very little of interest works properly out of the box.

Anyway, now I could proceed with my importing of data into a new worksheet, using importXML.

The importXML function allows me to specify a URL source and an XPath query for each column.

I like my scripting languages as much as the next DHer, but Look Ma! No programming! Now that was a fun MC² meeting.

Comments are closed.