Some Thoughts on the Digital Humanities Conference

DH2010 The Digital Humanities 2010 programme committee sent out messages yesterday about acceptances and rejections. My motivation for this post has mostly to do with the impact of this process on newer members of the digital humanities community, but more on that a bit later.

I had some success, but more failures, which more or less corresponds with the competitive nature of the conference: about 34% of paper proposals were accepted. It may appear to be a strategy to send out several proposals and hope for the best, but it actually happens more naturally than that since I’m fortunate enough to be a part of several collaborative teams that tend to submit proposals.

I’ve participated in the conference for well over a decade now, both as a presenter and reviewer (nearly every year), as well as a member of the programme committee on several occasions. Despite the frustration of rejections – and the almost inevitable feeling that someone “didn’t get it” – I can say that both reviewers and programme committee members work very hard. This year several reviewers were assigned five proposals (of about 1000 words) and sometimes more, which is a significant voluntary work-load. The programme committee spends an unbelievably busy time at the end of December, into January and sometimes February. The programme committee chair deserves special praise and gratitude (every year) for the enormous amount of work needed to pull things together.

I can honestly say that I feel that the quality of papers generally increases every year, which I think is related to the steadily increasing competitiveness of the process. Another way of formulating this is to say that a substantial number of very good proposals are rejected (again, I say this as someone who has had access to all of the proposals several times). This is similar to my experience on some grant panels: I’m always dismayed to see high quality research not supported because of quotas rather than quality. Yes, this is a reality of academia (and life in general), but it’s good to remind ourselves (when rejected) that there are many other measures of academic worth (acceptance to grad schools, completion of degrees, hiring into positions, respect of colleagues, etc.). A rejection of a paper to DH is not a rejection of a digital humanities scholar. In fact, it may not even be a rejection of the paper as much as an admission that the conference has difficulty supporting a growing, diverse community. One the plus side, constructive criticism on the proposal and/or project can be valuable for the author(s) in other contexts.

Although we’d like to think that the review process is smooth and completely fair, it’s not; it may just be the worst solution, except for all the others. DH has experimented in the past with various formats and strategies, including blind and double-blind reviews, but I think it’s safe to say that the current practice reflects lessons learned from the past (though of course circumstances can change). It’s worth noting that the Alliance for Digital Humanities (ADHO) has a dedicated standing committing for the conference, and much discussion happens around how to make things work optimally.

Given the competitiveness, just one weak review is often enough to dip a paper below the acceptance threshold (though, to be fair, the programme committee often looks closely for discrepancies and outliers). Since this is a multidisciplinary conference, the assignment of reviewers can be very challenging, and it’s almost impossible to find a reviewer whose expertise align perfectly with a multidisciplinary proposal. I think it’s well worth recognizing some of the factors that come into play:

  • with an international team of reviewers, the ratings mean different things culturally (8/10 may be outstanding for one person and just very good for another); the impact of the numerical system is somewhat dampened by the qualitative descriptions for each number (and the basic guidelines provided to reviewers), but there’s still clearly a distinction between “hard markers” and “easy markers” (and, frankly, I think this is the biggest challenge we have)
  • given the multidisciplinary reviewers, there’s probably a general phenomenon of pushing inwards toward the centre of recognized digital humanities research and practices – reviewers are more likely to be supportive when they fully understand a topic (and perhaps recognize it from previous years), which means that more experimental, novel or fringe topics sometimes suffer (I mean fringe to “traditional DH”, not necessarily fringe in academia)
  • reviewers know who the authors are and there’s a risk of reputation trumping the quality of the proposal itself; I’ve thought this was more of an issue in the past, but I know of many central and long-standing figures in DH who have had proposals rejects – moreover, as the community continues to expand and more people are involved, I think this phenomenon weakens
  • digital humanities (aka humanities computing) is a process, not a static discipline – knowing what it is and how to assess quality is an ongoing exercise and a moving target (which is a large part of what I enjoy so much about it)

The DH review process is very good, but it’s not perfect. It may never be perfect, but that doesn’t mean we can’t keep thinking about how it might be improved.

As I see the digital humanities gaining momentum in various places (cf. funding programmes, international collaboration initiatives, training, increased prominence in more traditional settings, etc.), one of my concerns is that the digital humanities conference actually serves to stifle growth and innovation, particularly for new scholars. Despite valuable initiatives to encourage new scholars, rejecting a good (but not top) proposal sends a certain message about how welcoming we are as a community of scholars (this despite the fact that I think the digital humanities have a well-deserved reputation as being a friendly and welcoming bunch, partly because for so long it has been a haven for scholars working at the fringe of their own disciplines). I’m not suggesting that we lower our standards to not hurt people’s feelings, I’m suggesting that we may be going through growing pains and we need to be sensitive to the message being sent to potential new colleagues. Let’s not forget that humanities computing has almost always been preoccupied with reaching new audiences and expanding its reach, especially into more “traditional” corners (though one wonders if, after at least 60 years of practice, we’re not traditional ourselves). One solution might be to increase the capacity of the conference by allowing more parallel sessions, as frustrating as that can be for conference delegates (and schedulers). Personally, I’d rather have the difficult task of choosing between multiple interesting sessions than risk turning away high quality research and the potential to draw more colleagues into our community.

Fortunately, the digital humanities conference isn’t the only vehicle for welcoming scholars into our community (though it is the one that many encounter very early on). I think the proliferation of many excellent conferences throughout the year (real conferences, unconferences, surreal conferences, etc.) have served to de-centre the digital humanities conference (in a good way), and contribute to the vibrancy of the community throughout the year. Social media like Twitter and Facebook have complemented the Humanist listserv in providing a constant flow of information about digital humanities, for those who care to listen and/or to become involved.

I must admit that I have mixed feelings about the growth and apparent increase in prominence of digital humanities (we should probably be careful what we wish for), but there’s seems to be no doubt that the digital humanities is a discipline (or set of disciplines) that’s healthy and growing. I hope to see you in London!

Digital Humanities Now

Real-time Voyeur wordcloud of this DH Twitter list

The good folks at the Center for History and New Media have initiated (yet another) fantastic resource in the form of Digital Humanities Now, a real-time, crowdsourced publication. It takes the pulse of the digital humanities community and tries to discern what articles, blog posts, projects, tools, collections, and announcements are worthy of greater attention. I’m especially happy to hear that part of the motivation for DHNow is to explore how, as Dan Cohen puts it “some version of this idea could serve as a rather decent new form of publication that focuses the attention of those in a particular field on important new developments and scholarly products”. Though perhaps not itself a quantifiable object in terms of hiring, tenure and promotion, it can certainly function effectively in that ecology to promote noteworthy content and projects. I see this as akin to participating at conferences: there’s a kind of intangible value ascribed to it by committees wanting to judge the scholarly activity of an individual (that can enhance the perception of other work). We can’t just wait for administrators to “get it”, we need to be more proactive, by providing them with other tools by which to assess the value of digital humanities scholarship. It’s partly for this reason that I think it’s so important to think, as a community, about how we can get DHNow (and similar initiatives) right.

One of my first reactions was to ask how the algorithms could get a good diversity of languages if – as I’d falsely assumed – there was linguistic analysis happening in the filtering and grouping process. It turns out that DHNow uses a much simpler and more elegant mechanism for gathering content: it (via Twittertim.es) analyzes common URLs, which really makes it language and context independent (though still likely very relevant if several DHers mention the same URL).

The URL-centric approach is useful for converging on a unique resource, but there’s a lot of discussion on Twitter that’s not oriented towards URLs (and of course there’s also a lot of DH discussion that’s not on Twitter). One huge advantage of the URL approach is that it usually produces a nice, coherent title to represent a set of tweets – it may be difficult to generate an expressive title from a tweet in the absence of a URL. In any case, what could be some possible (relatively simple) strategies for capturing a broader range of topics and discussions?

  • identify tweets that exhibit interrelatedness (within DHers) without necessarily containing URLs – replies to and retweets of DHers can suggest something of greater interest
  • identify tweets that contain common terms that are distinctive from a larger corpus of DH tweets
  • use trends

Other ideas? Tweet me @sgsinclair (with this URL http://tr.im/FnOI ;-)

I have another motivation for wanting DHNow to work optimally: I’m already overwhelmed by digital information from email, blogs, Twitter, and so on. I’m not especially keen to add yet another source of information, unless I have confidence that it will allow me to drop something else. Any filtering and aggregation obviously means compromise and loss – but if anyone should be up to the challenge of a technical and social problem, it’s the digital humanities community.

Tool APIs

In preparation for the upcoming API workshop, organized by Bill Turkel, I thought I’d try to assemble a few thoughts on APIs. This is the fruit of work on several text analysis projects, including TAPoR, HyperPo, Voyeur, BonPatron and MONK (I hesitate to associate ideas with specific people without their consent, but of course this is also the fruit of working with several talented people in digital humanities).

  1. Use REST and keep it simple. The universal KISS principle is certainly valid for APIs: the simpler things are the more likely they’ll be properly understood and adopted. The TAPoR Portal supports both SOAP tools and REST tools, but REST tools have been far less of a headache (some of the problems related specificially to Ruby’s “SOAP”: library, but even beyond that, for our purposes REST tools provide everything we need with less hassle). Part of keeping the syntax of the API simple is to plan for a wide range of calls; this doesn’t mean that all the calls should be implemented and documented, but listing them at the beginning helps to define the purpose and scope of the tool and helps prevent overly complex syntax that’s usually the product of afterthought.
  2. Document the APIs (preferably automatically). Documentation goes without saying (sometimes it even goes without doing). When tools get compared and evaluated, one of the main criteria is always the extent and quality of the documentation. Besides, good documentation usually avoids more support questions. Of course there may be cases where you want to keep some aspects of the API undocumented if they’re too much in flux: a documented API should be respected by both developer and user, even as the tool evolves. One of the best ways to ensure up-to-date documentation is to find a way of having tools document themselves (like JavaDocs). This is one reason why HyperPo used Cocoon and XForms in order to have self-documenting tools.
  3. Provide XML and JSON output. Providing two forms of output is a bit contradictory to the KISS principle, but there are good reasons for providing both: 1) XML because it’s still a powerful interchange language and can be infinitely transformed with XSL; 2) JSON because results are usually easier and faster to work with for client-side Javascript libraries (not to mention less bandwidth because results are more compact). Part of a well-documented API is of course explaining the results format.
  4. Provide paging functionality. It’s a pain when you really want 5 results but the tool gives you 5,000: it’s an unnecessary performance burden in terms of bandwidth, memory, and computation. There are rare exceptions, but most tools should provide paging funcionality to ensure they’re scalable (even if the paging doesn’t seem immediately useful). Things get trickier when you need to combine pageing and sorting or grouping, but that’s where clear API documentation helps.
  5. Create a proxy to channel traffic. For many client-side web applications, having a proxy channel requests to other tools can help avoid some constraints imposed by cross-domain Javascript security. But even beyond that, proxies can serve a useful purpose as a centralized broker of communication with other tools – there are good chances that parts of proxy code can be reusable for different types of tool requests, even when direct requests to the tools are possible (for instance, caching results or handling connection errors). One of the main benefits that we’ve found from having a proxy layer goes beyond APIs: it decouples development schedules of the interface (client-side) group and the backend (server-side) group. For instance, it’s possible for the proxy to provide fake data to the interface until the backend is ready to provide real data – but the interface code is oblivious to the difference.
  6. For rich client-side tools, create embeddable objects. We usually think of APIs as providing data-centric content that is transformed and presented to the user in a different format. However, there are some tools where the server-side and client-side components work together and it’s actually the bundled combination that’s desired. These are often called widgets or badges, and they provide stand-alone functionality (like an embedded YouTube video or a Twitter timeline). A text-analysis example of this is Voyeur panels, like on the Day of Digital Humanities. Again, because of cross-domain security constraints, it can be easiest to embed these panels in an IFRAME (though of course they won’t be allowed to interact with the rest of the page).
  7. Coordinated redundancy of services would be nice. I’m talking here primarily about academic projects, not commercial services: our servers and services go down for a variety of reasons and there’s rarely staff available 24/7 to make sure things are restored immediately. Furthermore, we’re more likely in an academic context to deploy an experimental version of something that could inadvertantly break functionality required elsewhere. The problem is that if Project 1 depends on services from Project 2 but _Project 2 _ is unavailable for some time, Project 1 may be partly or completely compromised. Projects that want to do the right thing and integrate existing remote services instead of re-inventing every wheel or having local installations of every service (that individually need to be maintained) face a network challenge. One possibility (again that’s fairly specific to the academic context) is to have a mechanism for coordinating fail-over sites for certain services. This isn’t quite as easy as it sounds since you need to maintain and distribute (presumably again through an API) a list of current installations with versioning information included. One benefit, if really there’s collaboration between sites, is that you get a form of mirroring that can provide load-balancing as well as improve network latency by calling services that are closer to you. I don’t think we have any good examples of tools that are widely used by several digital humanities projects, but that’s not entirely the fault of the existing tools, it’s that we haven’t focused enough on APIs and distributed services….

Although HyperPo has many faults (not very scalable, not to mention the fact that its development has been superceded by Voyeur), it does provide a decent API. To see it in action, you can view the list of modular tools in the HyperPoets Gallery, click on one of the tools, scroll down to near the bottom of the page and click the API link, and submit some values (please don’t be a bully – use shorter texts:-). Some tools provide alternate output formats – you’ll find those in the options section if applicable. For instance:

Some similar calls are currently possible with Voyeur (http://voyeur.hermeneuti.ca/?input=http://www.un.org/Overview/rights.html), but there’s a long way to go yet…

Postdoctoral Fellowship in Digital Humanities and High Performance Computing (HPC)

Applications are invited for a one-year Postdoctoral Fellowship in Digital Humanities and High Performance Computing (HPC), under the supervision of Dr. Stéfan Sinclair from Communications Studies and Multimedia at McMaster University. The focus of the research will be large-scale, on-demand text analysis, and especially the development of HPC modules that can operate in a web-based context. McMaster University is internationally recognized as a leader in digital humanities scholarship and tool development.

This position is made possible in large part by Sharcnet, an HPC consortium in Ontario, as well as McMaster Libraries. The postdoctoral fellow will work closely with the supervisor (Sinclair), Sharcnet, and the Libraries.

Successful candidates will have experience working on textually oriented projects, strong Java and system administration skills. We are seeking an individual who can bring strong interest and enthusiasm to an area of research ripe for innovation, and someone who will be able to integrate well into a larger team.

Salary: $45,000 plus benefits

By July 31, 2009, applicants should send a full Curriculum Vitae, letters from two referees and a cover letter highlighting their prior achievements and a brief summary of their statement of their interest and experience in this area. Electronic submissions will be accepted. Applicants are strongly encouraged to contact Sinclair as early as possible to express interest and to ask any questions.

McMaster is committed to Employment Equity and welcomes applications from all qualified applicants, including women, members of visible minorities, Aboriginal persons, members of sexual minorities, and persons with disabilities.

Dr. Stéfan Sinclair (sgs [at] mcmaster.ca)
Communication Studies & Multimedia
McMaster University
1280 Main Street West
Hamilton, ON, L8S 4M2, Canada

Twitter

I’ve finally taken the plunge into Twitter. I have to confess that I do so a more out of academic curiosity than real interest, but I have a sneaking suspicion that I’ll enjoy it, at least for a while. I’m not sure I’ll ever get into the groove of divulging details of my personal life, but I think it might be an interesting medium for exchanging interesting nuggets about research and teaching activities. My first instinct was certainly to look up colleagues whose work interests me, rather than looking up friends and family.

Soon after creating my account I found a very simple Quicksilver ActionScript for posting tweets. I also found an updated script for Growl notifications, but what I really wanted was to be warned when tweets were too long (over 140 characters). After trying a few variants with more or less success, I settled on this script (though I made the failed Growl message a bit more noticeable).

Text Analysis in the News

A neighour and friend said he thought of me when he read an article about researchers doing text analysis to study the possible effects of Alzheimer’s on the vocabulary richness of authors. I asked to see the article and was very pleasantly surprised to see our TAPoR colleague Ian Lancashire prominently featured in a recent Maclean’s article (Ian has been a wonderful pioneer and leader for the text analysis community in Canada and beyond, earning him an Outstanding Achievement Award for Computing in the Arts and Humanities). The study was looking at longitudinal trends in the writings of Agatha Christie. Among other notable findings, the study identified a 30 per cent drop in vocabulary leading into Christie’s penultimate novel Elephants Can Remember. The Maclean’s article is a wonderful example of the potential for text analysis to be accessible and broadly relevant.

Day of Digital Humanities

Along with almost 100 other colleagues, I participated in the Day of Digital Humanities, a community publication project to bring together digital humanists from around the world to document what they did today. I think this was a super initiative, in part because it offers such an unusual glimpse at what so many of our colleagues do (beyond what they might present in a more polished for in conference presentations and scholarly articles).

I spent a good part of my day working on adapting Voyeur for use with RSS feeds (like the ones being produced by the Day of Digital Humanities). Here are some glimpses (this highlights Voyeur’s ability to be embedded in remote sites, like this blog – this should be considered a modest preview release of Voyeur):

  • a summary of all posts (currently submitted – I’ll update this tomorrow to catch the last ones):
  • the top types (words) grouped in documents by author

Among the countless things to do on Voyeur, I need to better display results when there are hundreds of documents (like when each post is a separate document), but the full Voyeur interface is fairly usable for the second arrangement of documents (one document per author).

Citing Software

Reference Geoffrey Rockwell and I have been giving considerable thought recently to how we might facilitate the integration of text analysis tools and results into (mostly scholarly) writing. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. Whether or not scholars do make it more of a habit to cite tools is beyond our control, but we want to design our upcoming tools to make it easier for them to do so. At the very least this includes:

  • providing a preferred general citation for the tool suite
  • providing preferred citations for specific results including references to the tool and the source text(s)
  • making it easier for users to extract static or dynamic results and include them elsewhere (a web-based blog editor, an HTML editor, a word processor article, etc.), with a reference

An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the text analysis tools and digital texts used – is sure to be important.

I was prompted to write this post by a recent notice in a Globe and Mail article that provided several statistics:

These figures have been compiled by Patrick Brethour, the Globe and Mail’s British Columbia editor, drawing from the 2006 census with the help of special software from Tetrad Computer Applications Inc.

The figures referred to are mostly present in the text of the article as well, but I wonder if the editor would have been as likely to include this notice if there hadn’t been the inset with the concentrated statistics. The distinction is important because it’s about recognizing what contributed to the research regardless of how the results are presented (though ironically, journalism tends to have very different standards of citation that academic writing, and yet it’s in a newspaper article that we find a software tool cited). Will standards for citing digital tools in the humanities shift in the coming years?

TREX 2008 Winners Announced

TREX08 TADA (the Text Analysis Developers’ Alliance, of which I’m the unofficial future former director) has announced winners of the 2008 T-REX Competition (for text analysis tools development and usage). The panel of judges reviewed the many submissions received and has recognized winners in five categories:

  • Best New Tool
    • Degrees of Connection by Susan Brown, Jeffery Antoniuk, Sharon Balazs, Patricia Clements, Isobel Grundy, Stan Ruecker
    • Ripper Browser by Alejandro Giacometti, Stan Ruecker, Ian Craig, Gerry Derksen
  • Best Idea for a New Tool
    • Magic Circle by Carlos Fiorentino, Stan Ruecker, Milena Radzikowska, Piotr Michura
  • Best Idea for Improving a Current Tool
    • Collocate Cloud by Dave Beavan
    • Throwing Bones by Kirsten C. Uszkalo
  • Best Idea for Improving the Interface of the TAPoR Portal
    • Bookmarklet for Immediate Text Analysis by Peter Organisciak
  • Best Experiment of Text Analysis Using High Performance Computing
    • Back-of-the-Book Index Generation by Patrick Juola

Congratulations to all winners and thanks to all participants! Watch this space for upcoming TADA events, including the next TREX Competition.

Johnny Rodgers on Digital Texts 2.0

DText2 Johnny Rodgers, lead developer of Digital Texts 2.0 is getting some media love from the School of Interactive Arts & Technology where he’s just started an MA this fall. Johnny will be presenting our work on Digital Texts 2.0 in a couple of weeks at CaSTA 2008.

Syndicate content