With a key/value design that delivers powerful – yet simple – data models for storing massive amounts of unstructured data, Riak KV is built to handle a variety of challenges facing Big Data applications that include tracking user or session information, storing connected device data and replicating data across the globe.
Riak KV automates data distribution across the cluster to achieve fast performance and robust business continuity with a masterless architecture that ensures high availability, and scales near linearly using commodity hardware so you can easily add capacity without a large operational burden.
Riak KV Resources:
Riak uses a key/value design to store key/value pairs comprising objects in buckets that are flat namespaces with some configuration properties, e.g., the replication factor. Keeping in mind that Riak is content-agnostic so values can be of any content type and that your application needs should be considered when structuring data, these are some common approaches to typical use cases.
|Session||User/Session ID||Session Data|
|Content||Tile, Integer||Documents, Images, Posts, Videos, Texts, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|User Data||Login, Email, UUID||User Attributes|
Riak works well as a document store with two features recently added to it – Riak Search and Riak Data Types – that make it easier to query.
Riak Search gives you a variety of ways to implement a document store in Riak. For example, you can store and query JSON objects or XML and retrieve them later via Solr queries, or store data in Riak maps, index that data using Riak Search, and run Solr queries against those stored objects.
It helps to think of these Search indexes as collections with each index having a document ID generated automatically by Search. Since you’re not running key/value queries on these objects, Riak will automatically assign keys for these as well.
Riak provides data related to current operating status in the form of counters and histograms made available through the HTTP API via the /stats endpoint or through the riak-admin interface, i.e., the stat and status commands. Graphing the throughput stats relevant to your use case is often useful for capacity planning and use trend analysis and can also help establish an expected baseline so you can investigate unexpected spikes or dips in the throughput. Riak also provides integrations to many open source, self-hosted and service-based solutions such as New Relic, Nagios and Zabbix for aggregating and analyzing statistics and logging data for purposes of monitoring, alerting and trend analysis on a Riak cluster.
Explicitly supported on several cloud infrastructure providers including AWS and Azure, Riak is designed to run in production on commodity hardware for a number of different service architectures including support for both private and public infrastructures and scales horizontally so you can easily add capacity by joining additional nodes to your cluster. So your choice of hardware should be based on how many objects you plan to store and the replication factor as well as other considerations that include intra-cluster bandwidth and IO capacity, especially for heavy write loads.