When customers approach Denodo to ask for help with their data integration complexities, we typically see issues such as data siloes, legacy applications, digital transformation, mobile enablement, real-time data needs, cloud and SaaS application integration, to name but a few. Coupled with these areas is the challenge faced by managing the sheer volume of data now being created as well as the broad range of data owners and stakeholders involved. In recent times, certain models have been developed to help simplify and address these challenges. At Denodo we often get asked about the concepts of Data Mesh and Data Fabric and, as the leading provider of data virtualisation, where this fits in.
It could be argued that the naming of Data Fabric and Data Mesh has confused some stakeholders in the market, as both terms create an image of a malleable layer or blanket, laid over the data. In fact, they are very different. So, whilst both approaches sound similar and do indeed both provide architectures to access disperate data sources, a Data Fabric is technology-focussed whilst a Data Mesh is more about being organisationally and process-focussed.
One of the earliest definitions of the data fabric came from Forrester’s Anayst, Noel Yuhanna where consideration was largely around big data scenarios. Data fabrics have gained significant momentum in recent year and they remain an area of keen focus. For example, last year (2021) Gartner called out data fabrics as one of the top 10 Data and Analytics technology trends. It is essentially a unified architecture to enable the easy access and sharing of disperate data and provides a management framework for doing so.
A plethora of different tools are likely to be used to facilitate the Data Fabric capability in an organisation, including ETL/data warehousing, MDM, Data Virtualisation, data catalogues, governance, security and so on. This is exemplified in Forrester’s most recent report on the Enterprise Data Fabric where it can be seen that Denodo’s leadership position for data virtualisation, sits alongside vendors with solutions of other technology types.
First defined by Zhamak Dehghani at ThoughtWorks, a Data Mesh is a type of data platform that federates the ownership of the data amongst domain data owners. They are able to use their domain-specific knowledge of the data and the business to create the data as products. So, each domain handles its own domain-specific data including the modelling and aggregation, aiding data democratisation and self-service for the business. Rather than the monolithic approach of a Data Fabric, this distributed approach allows domains in the business to manage their own data pipelines.
Of course, federating in this way without consideration for other elements of the business would cause fragmentation, duplication and inconsistencies to develop so a key element of the Data Mesh is the interoperability between domains. The universal interoperable layer provides the data standards and rules for syntax and governance across the whole and this is where data virtualisation can play a vital part.
By using a virtual layer between the disperate data sources and the domain-specific data consumers, data virtualisation facilitates a Data Mesh. Unlike a traditional ETL/data warehousing models, data virtualisation avoids the need to ‘move and copy’ the data. Instead, the semantic models are defined in the virtual layer between the many disperate data sources and the many different data consumers. This allows the users to abstract the data they need, as and when they need it, ensuring therefore that it is real-time or near real-time, rather than static data they would otherwise get from an ETL/data warehouse model. As data volumes continue to grow the ‘move and copy’ model becomes ever more expensive and as the data becomes ever more disperate, using data virtualisation becomes the obvious choice for a modern high performance data architecture. Gartner’s Data Management Hype Cycle positions data virtualisation on the ‘Plateau of Productivity’ indicating the very low risk and high level of return on investment that is obtained.
Having a data virtualisation layer, through which the data is abstracted, inevitably facilitates the interoperability, governance and security needed in a Data Mesh architecture, whilst empowering the federation required for domain-based ownership and agile BI. It is a superset of federated data models and includes the advanced capabilities of performance optimisation as well as self-service search and discovery.