April 27, 2010
because you needed another local key/value store
One aspect of Riak that has helped development to move so quickly is pluggable per-node storage. By allowing nearly anything k/v-shaped to be used for actual persistence, progress on storage engines can occur in parallel with progress on the higher-level parts of the system.
Many such local key/value stores already exist, such as Berkeley DB, Tokyo Cabinet, and Innostore.
There are many goals we sought when evaluating which storage engines to use in Riak, including:
- low latency per item read or written
- high throughput, especially when writing an incoming stream of random items
- ability to handle datasets much larger than RAM w/o degradation
- crash friendliness, both in terms of fast recovery and not losing data
- ease of backup and restore
- a relatively simple, understandable (and thus supportable) code
structure and data format - predictable behavior under heavy access load or large volume
- a license that allowed for easy default use in Riak
Achieving some of these is easy. Achieving them all is less so.
None of the local key/value storage systems available (including but not limited to those written by us) were ideal with regard to all of the above goals. We were discussing this issue with Eric Brewer when he had a key insight about hash table log merging: that doing so could potentially be made as fast or faster than LSM-trees.
This led us to explore some of the techniques used in the log-structured file systems first developed in the 1980s and 1990s in a new light. That exploration led to the development of bitcask, a storage system that meets all of the above goals very well. While bitcask was originally developed with a goal of being used under Riak, it was also built to be generic and can serve as a local key/value store for other applications as well.
If you would like to read a bit about how it works, we’ve produced a short note describing bitcask’s design that should give you a taste. Very soon you should be able to expect a Riak backend for bitcask, some improvements around startup speed, information on tuning the timing of merge and fsync operations, detailed performance analysis, and more.
In the meantime, please feel free to give it a try!