Reading Time: 3 minutes

Data Modeling enables organizations to define and design their data stores, adapting them to their business requirements and allowing their data structures and policies to be included in the necessary internal communication, design and documentation mechanisms with no or little ambiguity.

Specific data modeling tools can be used to create the different layers of logical and physical models that represent the diverse data stores in an organization, and these same tools can help admins bootstrap the development of the data schemas, maintain them, keep control of changes and perform other types of synchronizations of metadata to and from the data stores. From a higher-level point of view, data modeling conforms a first step in the way to integrated, company-wide data governance strategies.

Modeling Data Virtualization

When confronted with data virtualization scenarios, data modeling tools face a series of complexity challenges that limit the scope of their applicability. These challenges derive from the fact that virtual databases do not merely conform stores of data at rest —be those in the form of relational tables or other types of structures—, but instead define a series of transformations that bring data from its original states in the source data stores towards their final shapes and formats in the interfaces —or data contracts— that will be offered to the data virtualization platform clients.

Typical data modeling tools are not a good fit for designing virtual databases, as the data retrieval and combination features that are key components of data virtualization systems are generally unknown to data modeling tools. So that will be left for the more specialized administration tools provided by the data virtualization platforms themselves.

But as the data virtualization platforms’ aim will be to offer their combined/reshaped data through a series of data interfaces and contracts, and these will actually be what any client pieces of software will see from these DV systems, we should be able to use general data modeling techniques for adequately defining those data interfaces/contracts in the form of data models, so that they can be adequately communicated, documented, and included in any higher-level data-related processes at the company (such as Data Governance).

And if data modeling tools can define the data contracts to be offered by data virtualization platforms, those data contracts could also be synchronized towards the DV platform in the form of virtual interface views, a step that would effectively allow top-down design for data virtualization.

Top-Down Data Virtualization design in practice

So how can we model those data contracts at the data modeling tools in a way that they can be later synchronized into interfaces at the DV side? In its simplest approach, the following mappings could be adopted:

  • Entities/Tables: Logical entities and their attributes (or physical tables and their columns) modeled at the data modeling tools would be synchronized into the DV platform as interface views. This way, even if data would not be really stored at the defined structures —because of being virtual—, their definition could be equally used for communicating these data structures to other systems that might depend on them. It would be a later task (at the DV side) to actually design the required data combinations and operations to fill those interface views with data coming from the real data sources, but this top-down approach would have already served its aim of allowing the definition of the data contracts beforehand.
  • Associations/Relationships: Associations (or relationships) modeled at the data modeling tools would be synchronized into the DV platform as logical associations and/or —depending on the specifics of the relationships— referential constraints between interface views. These structures at the DV side would easily allow any client software to easily determine relationships among the different parts of the data contract by scanning the association metadata offered by the virtual database.

So in summary, by means of the same typical Data Modeling tools that have been used for a long time to design relational databases and other common data stores, we can design the interfaces/contracts offered by data virtualization systems to other parts of the corporate ecosystem and also benefit from an easy bootstrapping of the virtual database development process.

Denodo Labs