Tool APIs

You are not authorized to post comments.

In preparation for the upcoming API workshop, organized by Bill Turkel, I thought I’d try to assemble a few thoughts on APIs. This is the fruit of work on several text analysis projects, including TAPoR, HyperPo, Voyeur, BonPatron and MONK (I hesitate to associate ideas with specific people without their consent, but of course this is also the fruit of working with several talented people in digital humanities).

  1. Use REST and keep it simple. The universal KISS principle is certainly valid for APIs: the simpler things are the more likely they’ll be properly understood and adopted. The TAPoR Portal supports both SOAP tools and REST tools, but REST tools have been far less of a headache (some of the problems related specificially to Ruby’s “SOAP”: library, but even beyond that, for our purposes REST tools provide everything we need with less hassle). Part of keeping the syntax of the API simple is to plan for a wide range of calls; this doesn’t mean that all the calls should be implemented and documented, but listing them at the beginning helps to define the purpose and scope of the tool and helps prevent overly complex syntax that’s usually the product of afterthought.
  2. Document the APIs (preferably automatically). Documentation goes without saying (sometimes it even goes without doing). When tools get compared and evaluated, one of the main criteria is always the extent and quality of the documentation. Besides, good documentation usually avoids more support questions. Of course there may be cases where you want to keep some aspects of the API undocumented if they’re too much in flux: a documented API should be respected by both developer and user, even as the tool evolves. One of the best ways to ensure up-to-date documentation is to find a way of having tools document themselves (like JavaDocs). This is one reason why HyperPo used Cocoon and XForms in order to have self-documenting tools.
  3. Provide XML and JSON output. Providing two forms of output is a bit contradictory to the KISS principle, but there are good reasons for providing both: 1) XML because it’s still a powerful interchange language and can be infinitely transformed with XSL; 2) JSON because results are usually easier and faster to work with for client-side Javascript libraries (not to mention less bandwidth because results are more compact). Part of a well-documented API is of course explaining the results format.
  4. Provide paging functionality. It’s a pain when you really want 5 results but the tool gives you 5,000: it’s an unnecessary performance burden in terms of bandwidth, memory, and computation. There are rare exceptions, but most tools should provide paging funcionality to ensure they’re scalable (even if the paging doesn’t seem immediately useful). Things get trickier when you need to combine pageing and sorting or grouping, but that’s where clear API documentation helps.
  5. Create a proxy to channel traffic. For many client-side web applications, having a proxy channel requests to other tools can help avoid some constraints imposed by cross-domain Javascript security. But even beyond that, proxies can serve a useful purpose as a centralized broker of communication with other tools – there are good chances that parts of proxy code can be reusable for different types of tool requests, even when direct requests to the tools are possible (for instance, caching results or handling connection errors). One of the main benefits that we’ve found from having a proxy layer goes beyond APIs: it decouples development schedules of the interface (client-side) group and the backend (server-side) group. For instance, it’s possible for the proxy to provide fake data to the interface until the backend is ready to provide real data – but the interface code is oblivious to the difference.
  6. For rich client-side tools, create embeddable objects. We usually think of APIs as providing data-centric content that is transformed and presented to the user in a different format. However, there are some tools where the server-side and client-side components work together and it’s actually the bundled combination that’s desired. These are often called widgets or badges, and they provide stand-alone functionality (like an embedded YouTube video or a Twitter timeline). A text-analysis example of this is Voyeur panels, like on the Day of Digital Humanities. Again, because of cross-domain security constraints, it can be easiest to embed these panels in an IFRAME (though of course they won’t be allowed to interact with the rest of the page).
  7. Coordinated redundancy of services would be nice. I’m talking here primarily about academic projects, not commercial services: our servers and services go down for a variety of reasons and there’s rarely staff available 24/7 to make sure things are restored immediately. Furthermore, we’re more likely in an academic context to deploy an experimental version of something that could inadvertantly break functionality required elsewhere. The problem is that if Project 1 depends on services from Project 2 but _Project 2 _ is unavailable for some time, Project 1 may be partly or completely compromised. Projects that want to do the right thing and integrate existing remote services instead of re-inventing every wheel or having local installations of every service (that individually need to be maintained) face a network challenge. One possibility (again that’s fairly specific to the academic context) is to have a mechanism for coordinating fail-over sites for certain services. This isn’t quite as easy as it sounds since you need to maintain and distribute (presumably again through an API) a list of current installations with versioning information included. One benefit, if really there’s collaboration between sites, is that you get a form of mirroring that can provide load-balancing as well as improve network latency by calling services that are closer to you. I don’t think we have any good examples of tools that are widely used by several digital humanities projects, but that’s not entirely the fault of the existing tools, it’s that we haven’t focused enough on APIs and distributed services….

Although HyperPo has many faults (not very scalable, not to mention the fact that its development has been superceded by Voyeur), it does provide a decent API. To see it in action, you can view the list of modular tools in the HyperPoets Gallery, click on one of the tools, scroll down to near the bottom of the page and click the API link, and submit some values (please don’t be a bully – use shorter texts:-). Some tools provide alternate output formats – you’ll find those in the options section if applicable. For instance:

Some similar calls are currently possible with Voyeur (http://voyeur.hermeneuti.ca/?input=http://www.un.org/Overview/rights.html), but there’s a long way to go yet…