November 30, 2012
Riak Cloud Storage is an S3-compatible, multi-tenant storage platform built on Riak. It combines the availability and fault tolerance of Riak with the ability to store large objects, an S3-compatible API, user administration and usage reporting. It can be used for public and private clouds or as reliable storage for applications. Today we’re announcing multi-datacenter replication support in Riak CS. Increasingly, global enterprises and apps require multi-site storage replication to achieve data locality, availability in disaster scenarios, or maintain active backups, so we’re very excited to provide these features in the latest release of Riak CS.
You can read more about multi-datacenter replication for Riak CS in the public docs, or sign up for an upcoming webcast on Thursday, December 6, which gives a technical overview of Riak CS and discussion of new features. If you want something more hands on, get a developer trial of Riak CS to take it for a test drive.
Technical Details
Multi-datacenter replication in Riak CS provides two modes of object replication: full sync and real-time sync. Data is streamed over a TCP connection, and multi-datacenter replication in Riak CS has support for SSL so data can be securely replicated between sites.
In Riak CS, large objects are broken into blocks and streamed to the underlying Riak cluster on write, where they are replicated for high availability (3 replicas by default). A manifest for each object is maintained so that blocks can be retrieved from the cluster and the full object presented to clients. For multi-site replication in Riak CS, global information for users, bucket information and manifests are streamed in real-time from a primary implementation to a secondary site so global state is maintained across locations. Objects can then be replicated in either full sync or real-time sync mode.
In full sync, objects are replicated from a primary Riak CS implementation to a secondary site on a configurable interval – the default is 6 hours. In full-sync replication, each cluster computes a hash for each key’s block value. Key/block pairs are compared, and the primary site streams any missing blocks or updates needed to the secondary site.
Real-time sync is triggered when an update is sent from a client to a primary Riak CS implementation. Once replicated in the first location, the updates are streamed in real-time to the secondary site. But what happens if a client requests an object from the secondary cluster and not all of its blocks have been replicated to that cluster? With Riak multi-site replication, the secondary cluster will request any missing blocks from the primary cluster so that the client can be served.
Try It Out
We’ve got two ways for you to try out Riak CS software. First, we can give you access to a hosted version where you can upload files, test out the API, and try s3cmd or other clients against it. If you want to try Riak CS on your own hardware, we also have a developer trial that gives you access to the Riak CS code and a little bit of our help to get you up and running. So check out the docs and then sign up to start.