At the Dataworks Summit/Hadoop Summit in Munich today Hortonworks unveiled the latest version of its Apache Hadoop distribution, with features that focus on boosting the performance of SQL interactive queries and optimizing existing enterprise data warehouse (EDW) investments.
"It's all about performance, flexibility and choice," says Scott Gnau, CTO of Hortonworks. "We've made some significant improvements with the community to Apache Hive 2.0 operational and tactical query response times."
"One of the biggest requests we've gotten is EDW optimization," he adds. "Customers are just being inundated with data. They have an existing toolset in place for delivering analytics to their executives, but they want better integration from us."
The release is also delivers on the company's cloud-first strategy: HDP 2.6 was delivered first on Microsoft Azure HDInsight and Hortonworks Data Cloud for AWS. It hasn't neglected new ground for on-premises workloads either. The company noted that its partnership with IBM has made HDP 2.6 available on IBM Power Systems.
A boost for SQL queries
Hortonworks Data Platform (HDP) 2.6 supports Apache Hive 2.0 with LLAP functionality for intelligent in-memory caching, which Gnau says provides a dramatic performance boost for SQL interactive queries. The new release also introduces ACID merge functionality, which enables incremental data maintenance through Upsert support. This, Gnau says, allows for additional use cases for optimizing existing EDW investments without requiring all data to be reloaded.
"There is a need to improve SQL performance and support, along with Spark adoption in Hadoop-related workloads," Tony Baer, principal analyst, Ovum, said in a statement Tuesday. "A key enhancement is the addition of Upsert support, which is essential for building confidence in data currency and making Hadoop BI-ready. Backing Hive with LLAP and Spark 2.1 should produce the kinds of service levels that BI users expect. With the 'cloud-first strategy, Hortonworks is going where more and more new Hadoop workloads are heading."
"HDP 2.6 showcases the advantages of the open source community," Gnau adds. "A significant amount of innovation is coming out of the Apache community, and because of our commitment to deliver an open platform, we are uniquely able to deliver these value-creating capabilities to customers. HDP 2.6 introduces key new enterprise features which will benefit our customers immediately — no application rewrite required."
There's more in HDP 2.6
Other new capabilities in HDP 2.6 include the following:
- Data science at scale. HDP 2.6 features an improved user experience for data scientists with Apache Spark 2.1 and the latest version of Apache Zeppelin.
- Enterprise-grade security. Enhancements to Apache Ranger and Apache Atlas include reduced sync time for customers with large user bases and enhanced bulk addition of policies from one environment to the other via expanded tag-based policy support for Spark, Zeppelin, HDFS, Apache Kafka and Apache HBase.
- Streamlined and proactive operations. The latest version of Apache Ambari provides simpler configuration of services and components when a cluster node restarts. Additionally, SmartSense has been enhanced to automate the application of the recommendations for cluster improvement.
Sign up for Computerworld eNewsletters.