With an eye toward enhancing its Internet of Things (IoT) analytics platform with advanced machine learning and real-time analytics capabilities, machine data analytics specialist Glassbeam today released a new version of the platform that tightly integrates it with Apache Spark.
Spark is a cluster computing framework designed to sit on top of Hadoop Distributed File System (HDFS) in place of Hadoop MapReduce. With support for in-memory cluster computing, Spark can achieve performance up to 100x faster than Hadoop MapReduce in-memory or 10x faster on disk, making it well-suited to machine learning algorithms.
We're seeing growing demand for real-time analytics as organizations seek to deliver richer insights to decision-makers and their partners and customers, faster," says Jason Stamper, analyst, Data Platforms & Analytics, 451 Research. "While the Internet of Things may still be in its infancy, that too will require rapid analytics and machine learning capabilities. Since it is already tracking 1.2 billion sensor readings per day, Glassbeam has some expertise in this field, and we see its integration with the Apache Spark data processing engine as another step in the right direction."
Puneet Pandit, CEO and co-founder of Glassbeam, notes that the Glassbeam SCALAR cloud-based analytics platform was architected with Cassandra as a distributed data processing architecture that scales both linearly and horizontally across thousands of nodes. Spark, on the other hand, is a purpose-built scalable and distributed in-memory compute architecture. Together, Pandit says, you get the best of both worlds: A super fast, scalable IoT analytics solution for large-scale data processing.
The integration of Apache Spark's MLlib library -- a scalable machine learning library consisting of algorithms and utilities, including classification, regression, clustering, etc. -- gives SCALAR machine learning algorithms to perform predictive analytics on large sets of machine data in the cloud. And implementing Apache Spark SQL directly on Cassandra will allow real-time analytics on data as it is streaming in and getting parsed and transformed through the SCALAR platform. Finally, the integration of Spark Streaming means that streaming applications can be built the same way as batch jobs using Spark's API, which supports both Java and Scala.
"Glassbeam goes beyond analytics that narrowly focus on index, search and analysis of simple data formats from IT assets locked away in data centers," Pandit says. "Built for the Internet of Complex Things, the platform processes large amounts of data with extreme speed, employing advanced machine learning algorithms and real-time analytics. This means our customers can crunch years of data in a very short time to produce rich intelligence that helps avoid problems and totally optimizes business operations."
Sign up for Computerworld eNewsletters.