Reading Time: 3 minutes

Leveraging enterprise data for generative AI and large language models presents significant challenges related to data silos, quality inconsistencies, privacy and security concerns, compliance with data regulations, capturing domain-specific knowledge, and mitigating inherent biases. Organizations must navigate the complexities of consolidating fragmented data sources, ensuring data integrity, and addressing ethical considerations.

Techniques like Retrieval Augmented Generation (RAG), help bridge the gap between enterprise gen AI apps and the actual enterprise data. Although RAG is a great tool, and LLMs have enabled natural language to SQL translation, these capabilities fall short in situations where enterprise data is scattered in a complex, heterogeneous data landscape. It’s fairly easy to extend your chatbot to query one database, but how to deal with a complex system with an EDW, data lake, several applications on prem and SaaS? How to ensure security is consistent across that ecosystem? How to bring forward governance, lineage, documentation or data quality?

A combination of Denodo’s data virtualization and Google’s Vertex AI technologies can address these challenges and opportunities. While Denodo enables the creation of a unified, virtual view of data from disparate sources, providing a single access point; Google’s Vertex AI embeddings, foundation models and vector search capabilities with LangChain help to build generative AI applications that could then intelligently retrieve, synthesize, and process relevant information from the virtualized data layer. 

A screenshot of a computer

Description automatically generated
Fig 1 – Platform Architecture – Denodo with Vertex AI
A diagram of a software flowchart

Description automatically generated
Fig 2 – AI Agent enabled by RAG on Vertex AI with Vector Search, Gemini Pro and Denodo

In the example above; a mortgage processing company integrated Denodo’s data virtualization with RAG models on Vertex AI to empower loan officers with generative AI and large language models to efficiently handle complex queries and tasks. The data virtualization layer unifies fragmented data sources like the EDW, CRM systems, loan origination software, and compliance documents, ensuring the RAG model has access to comprehensive, up-to-date information. When a loan officer submits a natural language query, the retrieval component fetches relevant data from virtualized sources, such as eligibility criteria or regulatory guidelines, which the language model then processes to generate detailed, contextual responses tailored to the customer’s needs. This approach streamlines intricate processes like pre-qualifying customers, preparing loan packages, and addressing underwriter requests, enabling loan officers to provide accurate, compliant, and personalized service while leveraging the power of generative AI capabilities.

The Denodo Platform, leverages data virtualization technology, eliminating the need for data movement or consolidation before augmenting an AI application. It provides a single, consolidated gateway for AI applications to access integrated data and offers a number of other key benefits, including: 

  • A unified, secure access point for LLM to interact with and query all enterprise data (ERP, Operational Data Mart, EDW, Application APIs)
  • A rich semantic layer. Providing LLMs with the needed business context and knowledge (such as table descriptions, business definitions, categories/tags, and sample values)
  • Quick delivery of logical data views that are de-coupled and abstracted from the underlying technical data views (which can be difficult to use by LLMs) 
  • Delivery of LLM friendly wide logical table views without needing to replicate and combine multiple datasets first physically.
  • Built-in query optimization relieves LLMs from dealing with specific data source constraints or optimized join strategies.
  • Data lineage and other governance tools that can surface additional elements like data provenance, data quality markers, endorsements and warnings, in addition to the natural language response to a query

As enterprises emerge into the generative AI revolution, data management and virtualization will play a pivotal role in unlocking the full potential of this transformative technology. By harnessing the combined power of Google’s Vertex AI and Gemini APIs, organizations can seamlessly integrate their fragmented data sources, ensuring that generative AI models have access to a unified, comprehensive, and up-to-date view of enterprise data. This approach not only enables more accurate and contextual outputs from language models but also facilitates compliance with data governance and privacy regulations. With data virtualization acting as the backbone, enterprises can confidently leverage the capabilities of Vertex AI and Gemini APIs to develop innovative applications, streamline complex processes, and deliver personalized experiences to customers and employees alike, paving the way for a future where generative AI is deeply embedded in everyday enterprise operations and interactions with customers.

Watch a demo of Google Vertex AI and Denodo here.

Ron Yu