January 1, 2011
Happy New Year, Happy Wednesday, and Happy Riak 0.14! It’s a new year and it’s time for a new version of Riak. We’ve been putting the final touches on the latest release candidate for the last few days and we are now ready to call Riak 0.14, a.k.a “Dakota,” official.
Here’s the rundown of the large improvements (for those of you who just want the release notes, stop reading and click here):
MapReduce Enhancements
As promised, we put some significant development time towards the robustness, performance, and stability of MapReduce in 0.14. There are three primary areas worth mentioning:
- Key Filtering – Until now, MapReduce jobs were only able to run over entire buckets or a specific set of user-supplied keys. This approach can result in some performance hiccups as your buckets and inputs grow. Key Filtering, which is new in this release, enables you to build meaningful data into your keys and then filter for a given set of keys before processing, thus focusing the inputs for job and increasing performance. Key filtering will support boolean operators (and, or, not), url decoding, string tokenizing, regular expressions, and various string to numeric conversions.
- MapReduce Query Planner – Scheduling functions is hard. Our approach to several components in the scheduling process in previous Riak releases was less than optimal, so we’ve done a lot of work to refine the process. The new query planner batches each set of 50 bucket/key pairs that are then analyzed and scheduled around the cluster to maximize vnode coverage. This yielded a nice reduction in cluster chattiness while improving throughput. Win Win™.
- Segregated Javascript VM Pools – 0.14 will support three separate pools of Javascript VMs to reduce overall contention. Why three separate pools? For the three different JS calls: map functions, reduce functions, and pre-commit hooks. This fine-grained level of tweaking will let you better allocate resources and improve cluster performance.
This slide deck talks more about these three enhancements. And, there is a lengthier blog post coming out tomorrow dedicated to these MapReduce improvements…
Cluster and Node Debugging
The ability to monitor and debug a running Riak cluster received some substantial enhancements in 0.14. This is because Riak is now shipping with two new applications: riak_err and cluster_info. Riak hacker Scott Fritchie posted a blog back in November with an extensive overview of what these two applications will make possible when running Riak. Read that for all the details. The short version is that a) riak_err improves Riak’s runtime robustness by strictly limiting the amount of RAM that is used while processing event log messages and b) cluster_info assists troubleshooting by automatically gathering lots of environment, configuration, and runtime statistics data into a single file.
We’ve also added some new documentation to the wiki on the Command Line Tools page (at the bottom) with some more details on what cluster_info is all about and how use it.
Windowed Merges for Bitcask
The default storage backend for Riak is Bitcask, and we are increasingly seeing users select this for their production clusters thanks to (among other things) its low, predictable latencies and high throughput. Bitcask saw numerous enhancements and bug fixes in 0.14, the most significant of which is something called “windowed merges.” Bitcask performs periodic merges over all non-active files to compact the space being occupied by old versions of stored data. In certain situations this can cause some memory and CPU spikes on the Riak node where the merge is taking place. To that end, we’ve added the ability to specify when Bitcask will perform merges. So, for instance, if you know that you typically see the lowest load on your cluster between 2 and 4 AM, you can set this time frame as your acceptable start and stop time for merges. This is set in your bitcask.app file.
Other Noteworthy Enhancements
Other noteworthy enhancements include support for HTTPS and multiple HTTP IPs, packaging scripts for building debs, rpms and Solaris packages, and the ability to list buckets through the REST API. Check out the release notes for a complete list of new features and bug fixes.
Contributors for 0.14
Aside from the core Riak Devs, here is the list[1] of people (in no particular order) who contributed code, bug fixes and other bits between 0.13 and 0.14 (across all the OTP apps that come bundled with Riak):
Tuncer Ayaz, Jebu Ittiachen, Ben Black, Jesper Louis Andersen, Fernando Benavides, Magnus Klaar, Mihai Balea, Joseph Wayne Norton, Anthony Ramine, David Reid, Benjamin Nortier, Alexey Romanov, Adam Kocoloski, Juhani Rankimies, Andrew Thompson, Misha Gorodnitzky, Daniel Néri, andekar, Kostis Sagonas, Phil Pirozhkov, Benjamin Bock, Peter Lemenkov.
Thanks for your contributions! Keep them coming.
1 – If I forgot or misspelled your name, email mark@riak.com and we’ll add/fix it ASAP.
Hey, what about Riak Search?!
We’ve got a few release-related loose ends to tie up with Riak Search. But don’t worry. This release was very significant for Search, and we’re shooting to have it tagged and released next week.
So what should you do now?
- New to Riak? The Riak Fast Track will get you up to speed faster than you can say, “Just pipe it to dev/null.”
- Download Riak 0.14
- Build 0.14 on your platform of choice
- Check out the Riak code on GitHub
We’re already hard at work on the next release. We’re calling it “Elgin.” (Bonus Riak T shirt for anyone who can find the pattern behind the naming scheme; Dakota and Elgin should be enough info to go on.) If you want to get involved with Riak, join the mailing list or come hang out in the Riak channel on IRC to get your feet wet.
Other than that, thanks for using Riak!