Towards an Error-Free Enterprise LLM

LLMs promise to help with everything from predicting wildfires to assisting you while you drive. They keep getting faster and, as DeepSeek recently proved, cheaper. Use cases keep expanding.

The hype conceals a powerful truth. For as much progress as enterprises have made in integrating LLMs across large datasets, errors persist. In fact, data quality is still a major headache for organizations, and a major roadblock for AI initiatives. According to VentureBeat, GenAI implementations grew 17% in 2024, yet organizations report that their data quality dropped significantly.

Only after solving the accuracy problem will truly trusted artificial intelligence emerge. Once GenAI agents are able to act upon an enterprise’s proprietary data and knowledge base, LLMs will be able to do much more than predict the next line of code or write decent marketing copy. They’ll be able to extract, validate and analyze data, automate pattern recognition, and speed up labor-intensive processes. Automated underwriting, intelligent claims processing, and personalized financial advice are a few emerging use cases.

The caveat? These systems must be virtually error-free.

LLMs do not naturally get along with structured- and, to a lesser extent, unstructured data. This is also called the semantic gap. In the race to be data-centric, enterprises have layered on new data management systems to bridge the gap. While smart and necessary, these data management systems haven’t solved a basic underlying problem. In order to have an error-free LLM, you need good data, and in order to have good data at enterprise scale, you need a good ontology.

Enterprises thus find themselves at a crossroads. Tackle the ontology, or buy a bunch of bespoke LLMs that dig into structured data in one siloed platform. If you did the latter, you would be committing the same software bloat that happened in the SaaS era. At Fluree, we obviously want you to tackle the ontology–a process from which we’ve hopefully removed most of the headache. To understand where we’re coming from, read on.

Tuning doesn’t scale

Consider how marketing professionals currently use ChatGPT. They leverage its content writing capabilities while supplying the missing pieces—specific product features, differentiators or branding—from their own human memories. In theory, you should be able to connect an LLM directly to an enterprise database and have it populate responses with its own “memory” of accurate data.

In fact, tuning an LLM to query a database works reasonably well for small datasets and simple use cases. The LLM generates a SQL query, which executes against a database full of structured data. Then you receive a human-readable response. You never know, however, whether the LLM will query the correct column name, JOIN, table relationship, or other schema details.

Tuning works for small, simple use cases. A team can catch and correct LLM interpretation problems as they occur. Multiply that by dozens to hundreds of databases, though, and the approach won’t scale. Layer in various types and formats of data, and it’s nearly impossible to replicate what a human might be able to do.

You could also prompt engineer. Prompt engineering is both art and science, and very much one off. For instance, prefacing queries with specific prompts like “you are a semantic data expert” can suddenly improve accuracy, though the reasons aren’t always clear.

That kind of labor doesn’t scale to the complex schemas and sophisticated queries needed for enterprise use. Nor does it work when different departments need different questions answered about different data sets (unless you want to hire a bevy of full-time prompt engineers). The more databases you have, the more tuning and prompt engineering become unviable.

Enter the vector

The need to get LLMs to work at enterprise scale is driving adoption of new solutions. One such solution is retrieval-augmented generation (RAG). It’s a way to handle several big data sets at once.

RAG lets LLMs query multiple databases and come up with a unified response. LLMs need vectors in order to process their answers. Vectors are raw data converted to numbers, which in turn live in a vector database. The LLM travels between vectors, finding likely matches to predict answers.

This, too, presents a new challenge. Getting an LLM to use big sets of structured- and unstructured data means inserting a vector database between it and enterprise data. While unstructured data works well with a vector database, structured data does not.

Vector databases are designed to handle unstructured data. LLMs were trained on the public internet, which is full of unstructured data. Text, images, video, audio and other forms of unstructured data come in many different formats. Categorically, such data is not consistent enough to follow a pre-defined database schema. It’s generally stored in raw form in a NoSQL database or data lake, using a schema-on-read approach. Before it can be uploaded into a vector database, the raw data needs to be pre-processed into vector embeddings using a machine-learning model that encodes semantic or numerical relationships.

This pre-processing is easy compared to structured data. Such data must be sourced from its data warehouse or relational database, where it is already in a schema. It must then be cleaned and serialized into text-based formats like CSVs or HTML before being fed into a model. If data is sensitive, it should also be tokenized—converted into numerical form—for security. Large enterprise data sets often exceed LLM token limits, so data engineers need to be precise about which data they use.

Moreover, structured data, which includes pricing information, sales data, metrics, and so on, changes frequently. This makes it particularly hard for an LLM, which reasons using vectors and probabilities, to give an accurate reply.

Bespoke LLMs like Salesforce Genie and Workday Illuminate exist to solve the challenge of getting an LLM to give accurate replies from structured data. But they don’t work outside of their branded platforms. Such bespoke solutions also add unnecessary software bloat.

Knowledge graph to the rescue?

To solve the challenge of getting an LLM to read structured data, enterprise data teams are now turning to knowledge graphs, which unify structured data from multiple sources into a single semantic layer.

Knowledge graphs let you identify key entities in a structured database, such as customer IDs, and represent them as nodes. You then map relationships between those nodes (such as purchased product) as edges. As long as you define an ontology for your knowledge graph that matches the entities in your structured databases, your graph can pull from many databases at once to display new relationships and insights.

For example, if you searched your graph for customer IDs, purchased products, and purchase date, you could figure out when the majority of customers purchased a certain kind of product. If an LLM piggybacks upon that effort, you need only enter a natural-language question and the LLM will generate an accurate answer.

Or will it? If you’re tired of new technologies creating more problems, join the thousands of enterprise data scientists who feel the same way. As it turns out, the way the LLM synthesizes information from the knowledge graph creates … you guessed it … more inaccuracies.

The trouble with digging in

When a user inputs a natural language query, the LLM interprets it by extracting key entities, relationships, and context. The system then decides whether to pull data from a knowledge graph (structured data), a vector database (unstructured data), or both. The retrieved information is then integrated to enrich the LLM’s context, with additional filtering or refinement to ensure precision. Finally, the LLM synthesizes everything into a clear and coherent response.

The LLM uses probabilities to interpret key entities, relationships, and context. If these are not fixed in a universal ontology, the LLM doesn’t have an overarching schema to guide how sources are integrated. So information gets synthesized differently each time, as the LLM probabilistically infers data sources or relationships.

That’s why digging into a topic yields inconsistent LLM replies. When you query something for the first time, you get a reasonable answer. If you keep digging for more answers, though, the LLM reprocesses the same document in its entirety, thinking “my probabilities say I should try these other vectors instead,” and gives you different answers, even for the same question. For example, John the data analyst asks for payment terms for the Johnson contract. The LLM gives three different answers for the same prompt:

“Net 30 days, with a 2% discount for early payment.”

“Weekly deliveries on Tuesdays, except holidays.”

“Net 30 days from receipt of products or invoice, whichever is later.”

For all the power of knowledge graphs and similarity search, LLMs are still acting like interns when confronted with enterprise data. That won’t change until LLMs gain access to an authoritative master schema within the knowledge graph–that is, they infer from a universal ontology.

How to give LLMs direct access to authoritative data

At this point, there’s a temptation to continue to Frankenstein together a system, maybe by adding those branded, bespoke LLMs that only operate on one type of structured database. That’s not necessary. Instead, the problem can be fixed by returning to the roots of the knowledge graph–that is, the ontology.

A universal ontology (also known as a Universal Data Model) organizes and tags data in a meaningful, consistent way that both humans and machines can understand. It makes data interoperable across systems, so it’s always in a consistent format, no matter where it originates. The universal ontology lets the knowledge graph do the heavy lifting by consolidating similar data into unified entities. Instead of accessing or reprocessing the same document for related queries, the LLM can work with structured, pre-organized information.

Our approach at Fluree is that you need a universal ontology. All of your information should be tagged and organized to this model so that it means something to all humans and machines, including LLMs. Only then will everything else fall into place and be findable, accurate, interoperable and reusable. Data will be freed from its siloes, and everyone from HR to those out in the field will be able to view and analyze data through their own lens. LLMs will connect to the knowledge graph, which runs on the universal ontology, and receive close to 100% accurate answers (ChatGPT, by contrast, is about 60% accurate). If the data does not exist, the LLM will say so, instead of making things up. Everyone will be able to do their job better.

Start small

The problem with creating a universal ontology is that you can’t do it all at once. You have to integrate data from diverse sources with inconsistent formats and standards. It’s hard to scale a universal ontology to handle more and more datasets while maintaining performance across the organization. Employees must adapt to new workflows and standards.

Fluree has out-of-the-box tooling to create your universal ontology quickly. We use machine learning models that tag data to the ontology, including structured data from Salesforce, SAP and other popular systems.

While we might ultimately want a central AI brain for our entire company, current technology limitations require us to break this brain into optimized pieces. The best path forward is to focus on specific domains rather than attempting to create a single, all-encompassing AI system. Whether you use Fluree or hire an ontologist, you’ll have to implement your universal ontology one domain at a time.

Where to begin

Focus on individual domains first. Let’s say you want the LLM to draw upon information from three different data sets: a structured database sitting in Oracle, a content management system, and an application database. Your LLM should have access to an ontology of concepts to understand what the user is looking for, and then use the knowledge graph to find where the answers would be. Finally, you should put a plan together as to how the LLM will access that data.

Your plan should include:

1) The user’s domain/expertise. For example, someone in supply chain logistics might have a different view of terms related to “clients” or “partners” that someone else in finance would have. To specify, you want to layer multiple ontologies onto the data, and route appropriate terms within the context of the user’s domain.

2) The question itself. For example, if you ask about a particular shipment, and then ask a follow up question about the supplier, make sure that you promote the knowledge graph as the primary source of truth for the LLM through prompt engineering or other techniques. That way, the LLM will understand that a line of questioning is contextual to that particular shipment.

3) Understand user interactions. Over time, you will develop an understanding of how users from particular domains interact with the data from questions and self-teaching the right context. This tooling exists both at the model level as well as the Fluree level (in terms of ontology layering).

As you walk forward, domain by domain, you will eventually build a huge, distributed graph that comes in all shapes, sizes and formats. It will be universally accessible because you’ve laid the groundwork of a universal ontology (with subsets of domain ontologies) and have mapped data sources to that ontology.

In the broader sphere, once we perfect AI’s ability to access enterprise information, we might witness a dramatic transformation in how businesses operate. Imagine asking AI to show you a table of deals expected to close next month, no Salesforce needed. This could potentially eliminate the need for traditional enterprise applications entirely. All you’d need is a knowledge database and a well-tuned AI system.

Hopefully by now you understand why building a universal ontology is the right choice for reaching that elusive data-centric reality.