Getting the Fundamentals Right for Gen AI
Reading Time: 3 minutes

Last year, I was involved in a proof-of-concept (POC) in a major financial institution. It was mid-2023, and Generative artificial intelligence (Gen AI) was already reaching what’s known as Gartner’s ‘peak of inflated expectations. The POC was for a data platform to help data scientists build AI models and govern them; the models would aid in improving customer retention and making next-best-offers. The success criteria had been agreed upon. In the penultimate meeting before the POC began, one of the financial institution’s leaders on the data side suddenly came out with the statement, “bonus points for the vendor that displays Gen AI capabilities.” Needless to say, my team obliged. Our data management solution had a pre-release feature that could help data stewards manipulate data using Gen AI. Examples include: “Change the format of this pricing table to match $xx.xx.”

Now, almost a year on, this financial institution has not yet deployed Gen AI. Without revealing specifics, I would like to point your attention to this statement: According to Boston Consulting Group, ninety percent of C-suite executives are either waiting for Gen AI to move past its hype cycle or experimenting with it in small initial projects, because they don’t believe their teams can navigate the transformational changes posed by Gen AI.

Less than one-third of C-suite leaders surveyed by Accenture are ready to scale up Gen AI initiatives, with almost half saying that it could take more than six months to do so and reap the benefits. Only 5% of organizations have actually implemented Gen AI in production.

Salesforce and Forrester research has indicated a key reason – data maturity and readiness. Only one-third of leaders have implemented a robust data strategy necessary for Gen AI across their business.

Gen AI is Only as Good as Your Data

McKinsey, in its bold article A generative AI reset: Rewiring to turn potential into value in 2024, has stated that it’s critical to get the data foundations right, from clarifying decision rights, to defining clear data processes, to establishing taxonomies so models can access the data they need. 

This may seem obvious, but Gen AI is only as intelligent as the data it is trained on. For most organizations today, that means that LLMs know nothing about the organizations that they need to help and serve. Ask ChatGPT, “Tell me why my business did not deliver its target in Q1,” and of course you won’t get an answer (or maybe you would get some cleverly hallucinated reason), because ChatGPT doesn’t know about your business. You may of course train your Gen AI model with knowledge about your company’s products, targets, go-to-market, and customers – but this data is likely to keep changing. And remember, training LLMs is expensive. It requires thousands of Graphics Processing Units, or GPUs, offering the parallel processing power needed to handle the massive datasets these models learn from. The cost of these GPUs alone can amount to millions of dollars. According to a technical overview of OpenAI’s GPT-3 language model, each training run required at least $5 million worth of GPUs. 

There’s another factor – the potential for compromising sensitive data, such as customer data. LLMs have been known to memorize data. For example, if appropriate governance is not in place, an LLM could be asked “My name is Jane Doe and my social security number is…” and it may return the number. This is an absolute no-no and may cost organizations millions in regulatory fines.

Arming it with the Information It Needs

Think of a Gen AI application as a highly specialized surgeon. It’s not the surgeon’s job to monitor patient vitals like blood pressure, sodium levels, glucose, oxygen etc, before deciding a course of treatment. There are nurses and perhaps more junior doctors to collect and summarize this information for the surgeon, before they make a decision on how to operate.

Similarly, it is not a Gen AI application’s job to retrieve and integrate data from multiple varied data sources, such as CRM systems, APIs, inventory management systems, marketing automation solutions etc. It needs a unified access layer that can remove the complexity of accessing data from all these systems to deliver recommendations, customer service options, forecasts, summaries, or whatever you may be using it for. Also, there needs to be a layer between data consumers, including LLMs, and the underlying data, so that the underlying data can be trusted and sensitive information protected. 

At Denodo, we call this a logical data management layer, and we believe that this is a critical enabler for next-generation AI applications. A logical data management layer abstracts the needed information from all the varied systems mentioned above and presents it to the Gen AI application. It also provides the Gen AI application with much needed context in the form of business metadata – such as column names, tags, and field descriptions. A description of a table such as “pricing information” becomes that much easier for a Gen AI application to understand, when it is retrieving information. It provides the LLM “surgeon” with the necessary information to perform the best operation.

For a more technical point of view, check out this post, Unlocking the Power of Generative AI: Integrating Large Language Models and Organizational Knowledge, and let us know if your organization needs help implementing Gen AI in the comments section below.

Sunny Panjabi
Latest posts by Sunny Panjabi (see all)