To reveal valuable patterns, trends, and associations, Big Data applications need to process data in real time. With the Riak KV Spark Connector, you can move data from Riak KV to Apache Spark for enhanced in-memory analytics, and then store the results in Riak KV for future data processing.
Riak KV with the Spark Connector combines the real-time operational analytics of Spark with the availability and scalability of Riak KV.
Big Data means Big Analytics. You can boost the power of your analytics by adding Apache Spark to the availability and scalability of Riak KV. Apache Spark is an analytics framework, and Riak KV is built to store massive amounts of unstructured data. Together, they allow you to do real-time operational analytics.
The Apache Spark Connector supports both batch and streaming analysis, meaning you can use a single framework for your batch processing as well as your real-time analytics on operational data.
The Spark Connector allows you to expose data stored in Riak KV as Spark Resilient Distributed Datasets (RDDs) or DataFrames, as well as output data from Spark RDDs or DataFrames into Riak KV.
Spark Connector features:
Loading Data from Riak KV into Spark
The example below shows a full-bucket read using a single command.
val data = sc.riakBucket[String](new Namespace("bucket-full-of-data"))
.queryAll()
If you want specific results and know your keys by name, you can pass them in directly:
val rdd = sc.riakBucket(new Namespace("FOO"))
.queryBucketKeys("mister X", "miss Y", "dog Z")
The example below shows a range of values (e.g. 1 – 5000) defined by a numeric 2i index where the bucket is named “Bar” and the index is “myIndex”:
val rdd = sc.riakBucket(new Namespace("BAR"))
.query2iRange("myIndex", 1L, 5000L)
Big Data applications require fast analytics that scale as the data grows. Riak KV with the Spark Connector gives you high availability, scalability, and real-time analytics.
Make real-time decisions
Whether you make on-demand recommendations, or get automated alerts and analysis of events as they happen, advanced analytics is key to driving and guiding your business. Riak KV with the Spark Connector lets you integrate analytics into every business decision by providing fast, large-scale data analysis.
Increase performance and scale
As Big Data applications grow, you need a solution that not only analyzes data sets fast, but also scales easily on demand. Riak KV with the Spark Connector provides high performance analytics and near-linear scale using commodity hardware.
Faster time to market
Big Data applications require complex analytics. The Riak KV Spark Connector
simplifies working with Riak KV and Spark. Developers get a broad set of APIs to write complex aggregations. This means you can do more complex processing with less effort, allowing you to complete your applications faster, and to get to market sooner.