Logical Data Warehouses
Reading Time: 2 minutes

Can you set your enterprise data free and make it dance to your tunes? Tunes that range from self-service BI to predictive analytics? Yes you can, with the help of the logical data warehouse powered by data virtualization.

What is a logical data warehouse? In order to understand the logic behind the logical data warehouse it is necessary to examine what exactly a traditional enterprise data warehouse is.

“A data warehouse is simply a single, complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in a business context”. (Source: Barry Devlin’s Data Warehouse: From Architecture to Implementation).

This makes one immediately conclude that a data warehouse is a single physical database.  Wait a minute!  A data warehouse can be a representation of a heterogeneous set of data sources, each carrying portions of the enterprise data to be used for transactions or business analytics.  It can pretend like a single store of all the data from a huge database AKA the logical data warehouse. The logical data warehouse is an architectural style of representing data from various data sources.

In the traditional enterprise data warehouse (EDW) scenario, data typically arrives from transactional databases, line-of-business applications, CRM systems, ERP systems, or any other data source. This data is then cleansed and transformed over an ETL (extract, transform, load) process to ensure its reliability, consistency, and accuracy across the enterprise, before it is loaded to the data warehouse. This process ensured a stable and secure data platform from which data scientists and information workers could perform complex analytics and generate informative reports.

Today, however, the EDW is staggering and crumbling under the volume, variety and the velocity of big data that arrives from the cloud, social media, mobile devices and the IoT and lays scattered across global silos in myriads of formats. Add to this the assumption and expectation that all this will be accessible, meaningful and ready to be consumed by any self-service BI application in real-time or near real-time.  By the time an EDW project, described above, is implemented it often loses its relevance for current business needs.

More and more business organizations that are looking to tame this onslaught of data-gone-wild are turning to a logical architecture that abstracts the inherent complexities of big data using a combination approach of data virtualization, metadata management and distributed processing. The logical data warehouse architecture combines all these while it includes and transcends the capabilities of the EDW.

Data virtualization provides an integrated single view of data from distributed sources in real-time or near real-time regardless of the type or location of the data or whether it is structured, semi-structured or unstructured. Using query federation, caching and selective batch movement without the wholesale replication of data that happens in the EDW scenario. When the logical data warehouse powered by a full-fledged data virtualization product as the Denodo Platform unites with its unmatched distributed processing performance that pushes the processing down to the source system where the data sits waiting to be asked, be it in a Hadoop cluster, in a CRM system or an EDW, the dance of liberated data begins.

Nalini Mohan