Dr. Stéfan Sinclair, Associate Professor of Multimedia at McMaster University.
The good folks at the Center for History and New Media have initiated (yet another) fantastic resource in the form of Digital Humanities Now, a real-time, crowdsourced publication. It takes the pulse of the digital humanities community and tries to discern what articles, blog posts, projects, tools, collections, and announcements are worthy of greater attention. I’m especially happy to hear that part of the motivation for DHNow is to explore how, as Dan Cohen puts it “some version of this idea could serve as a rather decent new form of publication that focuses the attention of those in a particular field on important new developments and scholarly products”. Though perhaps not itself a quantifiable object in terms of hiring, tenure and promotion, it can certainly function effectively in that ecology to promote noteworthy content and projects. I see this as akin to participating at conferences: there’s a kind of intangible value ascribed to it by committees wanting to judge the scholarly activity of an individual (that can enhance the perception of other work). We can’t just wait for administrators to “get it”, we need to be more proactive, by providing them with other tools by which to assess the value of digital humanities scholarship. It’s partly for this reason that I think it’s so important to think, as a community, about how we can get DHNow (and similar initiatives) right.
One of my first reactions was to ask how the algorithms could get a good diversity of languages if – as I’d falsely assumed – there was linguistic analysis happening in the filtering and grouping process. It turns out that DHNow uses a much simpler and more elegant mechanism for gathering content: it (via Twittertim.es) analyzes common URLs, which really makes it language and context independent (though still likely very relevant if several DHers mention the same URL).
The URL-centric approach is useful for converging on a unique resource, but there’s a lot of discussion on Twitter that’s not oriented towards URLs (and of course there’s also a lot of DH discussion that’s not on Twitter). One huge advantage of the URL approach is that it usually produces a nice, coherent title to represent a set of tweets – it may be difficult to generate an expressive title from a tweet in the absence of a URL. In any case, what could be some possible (relatively simple) strategies for capturing a broader range of topics and discussions?
Other ideas? Tweet me @sgsinclair (with this URL http://tr.im/FnOI ;-)
I have another motivation for wanting DHNow to work optimally: I’m already overwhelmed by digital information from email, blogs, Twitter, and so on. I’m not especially keen to add yet another source of information, unless I have confidence that it will allow me to drop something else. Any filtering and aggregation obviously means compromise and loss – but if anyone should be up to the challenge of a technical and social problem, it’s the digital humanities community.
In preparation for the upcoming API workshop, organized by Bill Turkel, I thought I’d try to assemble a few thoughts on APIs. This is the fruit of work on several text analysis projects, including TAPoR, HyperPo, Voyeur, BonPatron and MONK (I hesitate to associate ideas with specific people without their consent, but of course this is also the fruit of working with several talented people in digital humanities).
Although HyperPo has many faults (not very scalable, not to mention the fact that its development has been superceded by Voyeur), it does provide a decent API. To see it in action, you can view the list of modular tools in the HyperPoets Gallery, click on one of the tools, scroll down to near the bottom of the page and click the API link, and submit some values (please don’t be a bully – use shorter texts:-). Some tools provide alternate output formats – you’ll find those in the options section if applicable. For instance:
Some similar calls are currently possible with Voyeur (http://voyeur.hermeneuti.ca/?input=http://www.un.org/Overview/rights.html), but there’s a long way to go yet…
Applications are invited for a one-year Postdoctoral Fellowship in Digital Humanities and High Performance Computing (HPC), under the supervision of Dr. Stéfan Sinclair from Communications Studies and Multimedia at McMaster University. The focus of the research will be large-scale, on-demand text analysis, and especially the development of HPC modules that can operate in a web-based context. McMaster University is internationally recognized as a leader in digital humanities scholarship and tool development.
This position is made possible in large part by Sharcnet, an HPC consortium in Ontario, as well as McMaster Libraries. The postdoctoral fellow will work closely with the supervisor (Sinclair), Sharcnet, and the Libraries.
Successful candidates will have experience working on textually oriented projects, strong Java and system administration skills. We are seeking an individual who can bring strong interest and enthusiasm to an area of research ripe for innovation, and someone who will be able to integrate well into a larger team.
Salary: $45,000 plus benefits
By July 31, 2009, applicants should send a full Curriculum Vitae, letters from two referees and a cover letter highlighting their prior achievements and a brief summary of their statement of their interest and experience in this area. Electronic submissions will be accepted. Applicants are strongly encouraged to contact Sinclair as early as possible to express interest and to ask any questions.
McMaster is committed to Employment Equity and welcomes applications from all qualified applicants, including women, members of visible minorities, Aboriginal persons, members of sexual minorities, and persons with disabilities.
Dr. Stéfan Sinclair (sgs [at] mcmaster.ca)
Communication Studies & Multimedia
McMaster University
1280 Main Street West
Hamilton, ON, L8S 4M2, Canada
I’ve finally taken the plunge into Twitter. I have to confess that I do so a more out of academic curiosity than real interest, but I have a sneaking suspicion that I’ll enjoy it, at least for a while. I’m not sure I’ll ever get into the groove of divulging details of my personal life, but I think it might be an interesting medium for exchanging interesting nuggets about research and teaching activities. My first instinct was certainly to look up colleagues whose work interests me, rather than looking up friends and family.
Soon after creating my account I found a very simple Quicksilver ActionScript for posting tweets. I also found an updated script for Growl notifications, but what I really wanted was to be warned when tweets were too long (over 140 characters). After trying a few variants with more or less success, I settled on this script (though I made the failed Growl message a bit more noticeable).
A neighour and friend said he thought of me when he read an article about researchers doing text analysis to study the possible effects of Alzheimer’s on the vocabulary richness of authors. I asked to see the article and was very pleasantly surprised to see our TAPoR colleague Ian Lancashire prominently featured in a recent Maclean’s article (Ian has been a wonderful pioneer and leader for the text analysis community in Canada and beyond, earning him an Outstanding Achievement Award for Computing in the Arts and Humanities). The study was looking at longitudinal trends in the writings of Agatha Christie. Among other notable findings, the study identified a 30 per cent drop in vocabulary leading into Christie’s penultimate novel Elephants Can Remember. The Maclean’s article is a wonderful example of the potential for text analysis to be accessible and broadly relevant.
Along with almost 100 other colleagues, I participated in the Day of Digital Humanities, a community publication project to bring together digital humanists from around the world to document what they did today. I think this was a super initiative, in part because it offers such an unusual glimpse at what so many of our colleagues do (beyond what they might present in a more polished for in conference presentations and scholarly articles).
I spent a good part of my day working on adapting Voyeur for use with RSS feeds (like the ones being produced by the Day of Digital Humanities). Here are some glimpses (this highlights Voyeur’s ability to be embedded in remote sites, like this blog – this should be considered a modest preview release of Voyeur):
Among the countless things to do on Voyeur, I need to better display results when there are hundreds of documents (like when each post is a separate document), but the full Voyeur interface is fairly usable for the second arrangement of documents (one document per author).
Geoffrey Rockwell and I have been giving considerable thought recently to how we might facilitate the integration of text analysis tools and results into (mostly scholarly) writing. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. Whether or not scholars do make it more of a habit to cite tools is beyond our control, but we want to design our upcoming tools to make it easier for them to do so. At the very least this includes:
An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the text analysis tools and digital texts used – is sure to be important.
I was prompted to write this post by a recent notice in a Globe and Mail article that provided several statistics:
These figures have been compiled by Patrick Brethour, the Globe and Mail’s British Columbia editor, drawing from the 2006 census with the help of special software from Tetrad Computer Applications Inc.
The figures referred to are mostly present in the text of the article as well, but I wonder if the editor would have been as likely to include this notice if there hadn’t been the inset with the concentrated statistics. The distinction is important because it’s about recognizing what contributed to the research regardless of how the results are presented (though ironically, journalism tends to have very different standards of citation that academic writing, and yet it’s in a newspaper article that we find a software tool cited). Will standards for citing digital tools in the humanities shift in the coming years?
TADA (the Text Analysis Developers’ Alliance, of which I’m the unofficial future former director) has announced winners of the 2008 T-REX Competition (for text analysis tools development and usage). The panel of judges reviewed the many submissions received and has recognized winners in five categories:
Congratulations to all winners and thanks to all participants! Watch this space for upcoming TADA events, including the next TREX Competition.
Johnny Rodgers, lead developer of Digital Texts 2.0 is getting some media love from the School of Interactive Arts & Technology where he’s just started an MA this fall. Johnny will be presenting our work on Digital Texts 2.0 in a couple of weeks at CaSTA 2008.
We’ve made available a preview release of Digital Texts 2.0, an attempt to experiment with social networking practises in the context of interacting with electronic texts. Although we have a fairly detailed scholarly agenda for this project, one of the things I’m most curious about is whether or not students would be interested in using a Facebook application to interact with texts, whether it be for pleasure or for course work. Similarly, can instructors find innovative ways to incorporate such tools into the classroom?
Some key features currently available:
Some upcoming features (probably by the end of the summer):
Are you planning on using Digital Texts 2.0? Please let me know!
Thanks to the Digital Texts 2.0 team and especially to the heroic efforts of Johnny Rodgers, the programmer and designer, and Shawn Day, who has provided outstanding feedback.