February 26, 2014
Amherst College is a private liberal arts college in Massachusetts that enrolls about 1,800 undergraduates. Their Archives & Special Collections houses rare books, literary manuscripts, and unique and rare materials documenting the College and its history. Its collections include many of Emily Dickinson’s original poems and letters. The Amherst College Library has been working to digitize images, manuscripts, and rare books in the Archives, and improve access to a large collection of digital images used in the teaching of art and architecture. They currently have 140,000 objects in their digital collections and they are adding up to 10,000 new objects each month.
Fedora (the underlying digital asset management system used by many colleges) is used for archiving, storing, and managing these documents. While it has the ability to support the number of objects being stored, Fedora tends to favor object fixity checks (checksums) and XML schema validation over speedy response times. It has worked for Amherst in terms of digital preservation and metadata support, but they have run into problems with its ability to handle high levels of concurrency (such as when Bon Appétit Magazine directed users to an Emily Dickinson manuscript featuring a recipe for doughnuts: acdc.amherst.edu/view/asc:17832). They use Riak as the intermediary layer between Fedora and the web, and as a huge caching mechanism for all of their data.
Previously, they were using a PHP app that directly accessed Fedora. While this solution worked, it was resource intensive and too slow for most purposes. It also wouldn’t allow them to grow their repository at the rate needed. They evaluated a few different systems (including CouchDB and MongoDB), but found Riak’s lack of sharding made it extremely easy to scale and offered better fault tolerance than the others.
Amherst brought Riak into production earlier this year. They are storing around one million objects in Riak across four nodes. Riak unifies all of the XML- and RDF-based metadata about each of their digitized objects (such as structural metadata in RDF and descriptive metadata in MODS) and stores it in a single JSON structure. When querying, they typically utilize the general key/value lookup or run MapReduce jobs. Since moving to Riak, their entire system is now an order of magnitude faster.
“We have been extremely happy with Riak and what it provides,” says Aaron Coburn, Systems Administrator at Amherst College. “While most of the objects stored aren’t publicly available, Riak still allows us to make over 2,000 manuscripts available to the world.”