Today I am pleased to announce that Riak TS (Time Series) 1.3 is GA and now is also officially open source (OSS) under the Apache v2 License. With Riak TS 1.3 we have something we believe can address the needs of our customers and the general Time Series market. The following are the major features that have been added to Riak TS:
- Data Aggregators and Arithmetic operations inside Riak TS
- Seamless integration with Apache Spark via our Spark Connector
- Extremely fast write and query performance for time series data
- High-performance clients released for Java, Erlang, and Python
- SQL-based query system (not our own flavor, but standard SQL)
- Riak TS 1.3 EE (Enterprise Edition) now supports Multi-cluster Replication
The above list is only a subset of the features that are in the Riak TS 1.3 release. For the full list, please see the release notes here.
We began working on Riak TS in December of 2014. Riak TS was first given to customers in beta form last year, where we received quite a bit of feedback. Based on this feedback, we began iterating on improvements, bug fixes, and features to make Riak TS the first Enterprise Grade time series solution in the market. We have been trying to address many of the common problems with Time Series databases that are currently available either in OSS or closed source. Some of these problems include:
- Not being able to cluster without sharding or being limited in cluster size to a small number of nodes.
- Lack of support for Spark or poor performance when interacting with Spark
- Being required to run Spark on all nodes of the database vs. scaling Spark independently
- Poor write performance under high volumes of data
- Poor query performance
- Non-standard SQL, where custom SQL is implemented
- And for Enterprise Edition, no Multi-Data Center or Multi-Cluster Replication capabilities for high availability scenarios
We began building Riak TS based on the experience and capabilities that we gained from creating Riak Core and Riak KV. This brings resiliency, very powerful clustering capabilities and no need for sharding out of the box.
Several NoSQL solutions have support for Spark, however most of them are either bound by a poorly implemented Spark connector or the requirement to run Spark on every cluster node. Our experience tells us that this is a big disadvantage when there are either large amounts of data analysis that needs to be done or when efficient use of hardware is a requirement. The reason that approach is problematic is that the size of the NoSQL cluster running all of those Spark instances on top is artificially forced to be disproportionately large. This was costing our customers money and was less efficient. In contrast, our Spark support offers the ability to run Spark decoupled from Riak TS (and very soon Riak KV). This provides a significant advantage allowing your Spark cluster to be sized independent of your Riak cluster.
Write performance under high write scenarios has traditionally been a problem for Riak. Over the past year, we have implemented numerous performance improvements, and Riak TS is a reflection of that work. We recently worked with a customer to run a series of benchmarks against their data set. The results were remarkable. They were able to achieve over 10 times the performance of their current Columnar Time Series solution on less than 1/4 the number of machines! It’s important to note that these significant improvements in performance were not made at the cost of data availability. The legacy of resiliency, for which Riak is well known, continues with Riak TS.
I’d like to share a few details about one area where we were able to improve Riak. Profiling of our software quantified the impact encoding and decoding had on our overall performance. As a result, we implemented optimizations around this. These improvements also led to us implementing native Erlang encoding support which further increased throughput (even compared to straight Protocol Buffers). To benefit from this improvement, and the associated performance boost, you will need to use one of our new client libraries. We currently have support for native Erlang encoding in our Java, Erlang, and Python clients. We will be adding support to the other clients soon.
As part of our research, we investigated the different variants of SQL being used by other NoSQL projects. In the end, we found that they were all unsuitable. Every single customer that we have spoken to has wanted or preferred standard SQL. Riak TS 1.3 delivers just that, with a shell that offers standard SQL commands. We will ultimately try to support as much standard SQL as possible.
Last, but not least, for our Enterprise customers we have added support for Multi-cluster Replication (MCR). This provides the same great characteristics customers have come to rely upon in Riak KV, where the feature has been known as Multi-datacenter Replication (MDC). Multi-cluster Replication is a more accurate description than Multi-datacenter Replication, as there is no requirement that the secondary clusters be housed in separate datacenters. We will standardize on the name Multi-cluster Replication for the feature across products going forward. For those not familiar with MCR, it is used for both resiliency/availability and for use where there are high volume writes coming into one cluster and analytics need to be done on a separate cluster against the same data. MCR offers a lot of flexibility in how customers can leverage replication.
I’m really excited about what we’ve been able to deliver in Riak TS 1.3. However, this is just the beginning. The Time Series market is quite broad, and there are more than a few very interesting needs in the Time Series market that we are looking to address. In addition to being a fantastic general purpose Time Series solution, we have specifically chosen to focus on IoT. IoT has a number of requirements that fall outside of what a general Time Series database provides. For example, some unique requirements may include keeping track of device status and state, along with resolving state conflicts and commands to be issued. To address this, we are planning something very exciting.
We’ve never forgotten our roots here at Riak. Riak KV was our groundbreaking entry into the world of distributed systems, and continues to be a flagship product which enjoys ever expanding adoption, especially in the Enterprise. To push Riak KV forward in a big way we have already begun to merge all of the great work on Riak TS 1.3, along with other innovations, into our upcoming Riak KV 2.2 release! Yes, we are getting the band back together to create an even better and more versatile solution. This work is just beginning, but represents what our customers have been asking for:
- More great work around CRDTs
- Better performance than Cassandra
- Improved operational simplicity
- Faster replication
- Option to apply schemas to buckets
We have also heard the community loud and clear that we need to address quite a few updates and pull requests and are working to address those as well. I hope that you find Riak TS 1.3 to be a powerful and valuable tool. We’re very proud of the work we’ve done on it thus far and we are also confident that the best is yet to come. We have a number of other projects in progress that will continue to quickly move Riak forward as the best NoSQL choice. I look forward to being able to tell you more about them soon.
Dave McCrory
Chief Technology Officer
@mccrory