Hadoop and Data Virtualization - Datapocalypse
Reading Time: 2 minutes

I was recently plugged into a Hortonworks webinar, and was astounded to hear their opening statement that by 2020 there will be more than 40 zettabytes of data in the world. To put this in perspective 40 zettabytes is 40 trillion gigabytes, and this is estimated to be 57 times the amount of all the grains of sand, on all the beaches on earth. Additionally the variety of data sources is rapidly increasing and it’s estimated that there are more than 7 billion devices connected to the internet. All of them create billions of connections between people, processes and things in 100s of different formats.

The emergence and explosion of these new types of data has put tremendous pressure on data systems within the enterprise. This is exacerbated by the fact that incoming data can often have no structure, or structure that changes frequently. Worse, the incoming data can have little or no value, and this often buries the goldmine of data that businesses are sitting on.

So why has Hadoop become important? And what does that have to do with data virtualization?

Hadoop provides a low-cost, scale-out approach to data storage and processing. It also helps to reduce costs and optimize the value associated with the enterprise data warehouse by providing storage optimization where the customer can archive data to drive down costs. Combine this with data virtualization, and you can deliver a single unified and coherent view of all enterprise data assets regardless of whether they are in Hadoop, an enterprise data warehouse, an analytical appliance or a NoSQL store.  Many people are using data virtualization for facilitating data warehouse offloading, exposing a single point of access to the reporting tools and federating queries over Hadoop and the data warehouse in a seamless way. Another popular use is the Logical Data Lake, a virtual layer over Hadoop and other analytical processing nodes that exploits the processing power of the specialized stores to achieve better performance than a physical data lake.

The technology partnership between Hortonworks and Denodo is a classic example of how Hadoop and Data Virtualization complement each other to deliver low cost data storage whilst enabling on-demand data access capabilities … Hortonworks provides an open and stable foundation for enterprises to build and deploy big data initiatives, and Denodo provides a common enterprise data layer that enables organizations to better harness all of their data and deliver faster agile information to the business.

This strategic partnership has resulted in both a HDP certified and YARN certified solution. These certifications evaluate and review technologies for architectural best practices and validate these against a comprehensive suite of integration test cases, benchmarked for scale under varied workloads. Any solutions which are HDP certified have been tried and tested by Hortonworks through a series of performance and integration tests and verified to work within the Hortonworks Data Platform.

To arm your enterprise for Datapocalypse 2020, read the Denodo and Hortonworks Solution Brief. Or tune into the joint webinar on 8 September at 10am PT on the Modern Data Architecture featuring customer VHA, largest member-owned healthcare company in the US, who will talk to the lessons learned and best practices for deploying Hadoop as a data lake with data virtualization.

Annette Cini
Latest posts by Annette Cini (see all)