RiskTech Forum

Wolters Kluwer: Big Data Moves From Batch Analytics to Real-Time Analytics

Posted: 1 July 2014  |  Author: Steven Lindo   |  Source: Wolters Kluwer

The Volume, Velocity, and Variety of Big Data

By now you have heard about the three V’s of Big Data Analytics: volume, velocity and variety of data. By now you have also heard that there are solutions in place to handle these problems. The solution is basically anything built on top of Apache Hadoop 2.0, or competitive products. Hadoop is now packaged by Cloudera, HortonWorks, and MapR.

These software solutions provide the scalability, flexibility and architecture that enables enterprises to address these issues. How did they solve it so quickly? Competition. I love the competition in this space. It takes yesterday’s problems and accelerates solutions. The winners overall are the consumers of these products. A little over a year ago, you may have struggled through hours of configurations and installation steps to setup of Apache Hadoop, but today you can download a Cloudera/Hadoop VM and be up and running in less than an hour.

Volume, Velocity, Variety, [✔] check. So what’s next? The V’s gave way to interpretation and visualization tools. It gave birth to tools like QlikView, Scout Analytics, Tabelau and Dundas to name a few. These tools interpret and expose the value in the data as “information.” They provide ease of consumption for the end-user of the “information.” The challenge for developers as they are using these tools is to think about the visualization and the data together and build it so that it will lead to action. That is the key. Creating actionable data, and delivering it at the point of use when you need. Not before and certainly not after. My point here is that these visualization tools are only as useful as the value created by the analytics platform first.

The Next Big Think in Big Data Analytics : Real-Time Analytics

Next on the horizon for Big Data is the demand for real-time analytics. These are not terms that normally go together. Traditionally, hadoop big data analytics is Batch Analytics. It takes hours to run to create the information, then to do the analysis, then to create value (or useful information). But the demand is now on for “time-to-value”. I love this space. So what you will see is that these Big Data Platforms will now include tools like Apache Spark (for in memory analytics), and Streaming Spark. The architecture stack will change again for the better in my opinion.

Share your success and/or insights you have with real-time analytics and with tools like Hadoop 2.0 w/YARN as well as the likes QlikView, Scout Analtyics and Tableau.