By Will Rourk
Today’s social networks give everyone the ability to contribute to a vast body of networked data. In April, the Library of Congress (LOC) announced that it would begin an ambitious program to tap into this digital corpus by archiving all of the messages produced via the social micro-blog site, Twitter.
The critical component of Twitter is the “tweet,” or 140-character message, that friends and followers receive and can pass on. Tweets are often forgettable and prosaic messages, but they have also been used as advertising vehicles or, at times, useful updates. As you might expect, public regard of these curt missives tends to fall under the category of trivial drivel.
But, to librarians and scholars, Twitter is about the sum of its tweet parts, which collectively stand as a rich resource of data derived from a multitude of first person narratives and correspondences. “In the past, the role of libraries may have been to curate the information they preserved—judging for the rest of society what information ‘deserved’ to be saved, cataloged, and made accessible,” notes Rebecca Cooper, Architecture Librarian for the University of Virginia. “As scholars, we mourn what was not saved because it was not deemed ‘important enough’ at the time. Twitter [offers] precisely this kind of material.”
As part of a new data genre, Twitter tweets have become important to LOC’s National Digital Information Infrastructure and Preservation Program. The initiative, started in December 2000, seeks to preserve blogs, digital photos, websites and E-mails at all levels, from the government on down. “The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites,” notes LOC blogger Matt Raymond. As a result, LOC digital archives reportedly hold more than 167 terabytes of web-based information. Down the line, LOC server storage will need to expand exponentially to match the output of social blogs like Twitter, which currently produces more than 50 million tweets per day.
Why should we care about archiving social network data? Supporters contend that tweets, E-mails, and the like are akin to scrapbooks or correspondences from, say, the eighteenth century. As the historian, and technologist, Loren Moulds points out, “Few people actually write things down anymore and, instead, use electronic media to correspond and document their thoughts.” Good for historians, then, but a sizable challenge for custodial bodies like LOC. “Looking for insights into people’s daily lives provides historical narratives with a critical granularity,” says Moulds, “and a comprehensive collection of Tweets represents an unprecedented repository of primary sources for future historians.”
To benefit from this comprehensive collection, new tools will need to be developed to help achieve the granularity that historians seek. Google’s recently unveiled Realtime Search might be one solution, which displays information from Twitter feeds in a way that registers query spikes. Another tool that allows deep data mining and visualization of Twitter tweets is the Archivist at http://archivist.visitmix.com. Punch in a Twitter hash-tag search like #greendesign, or even a Twitter username like @librarycongress, and the Archivist will return a grid of graphical results, the most recent tweets of the search term, and the ability to download the results for your own analysis.
Ultimately, these tools will aid researchers in their efficiency. Digital document files (including tweets) can be parsed and collated much easier than information derived from traditional hardcopies. Historians, or anyone, can directly link to historical data—both qualitative and quantitative—about the nascent days of social media. As Librarian of Congress James H. Billington is quoted as saying on the LOC’s website, “The Library looks at this as an opportunity to add new kinds of information without subtracting from our responsibility to manage our overall collection. Working with the Twitter archive will also help the Library extend its capability to provide stewardship for very large sets of born-digital materials.”