Blog Kevin Doubleday07.16.24

How Decentralized GraphRAG Improves GenAI Accuracy

Not all RAG methods are created equal. A decentralized knowledge graphRAG approach enables you to securely tap into a broad range of information sources in real-time, increasing the amount of trusted context for higher accuracy rates in GenAI applications.

Introduction

Businesses are continuously searching for the most effective approach to implement GenAI models without breaking the bank, failing their compliance obligations, or introducing error-prone hallucinations into their workflows. Organizations are realizing that simply fine-tuning models are not a sustainable method for delivering accurate answers from their internal bases of knowledge.  

In a previous article, we wrote about the synergistic relationship between knowledge graphs and GenAI. Today’s article will dive into a recent Fluree report that explores that relationship within the context of RAG.

Read the Whitepaper


The Threat of Inaccuracy in AI

One of the primary challenges with deploying AI in production environments is the risk of inaccuracy. Generative AI, such as large language models (LLMs), often “hallucinates” answers—producing responses that are plausible but incorrect. This is particularly problematic in enterprise settings where accuracy is paramount. Two significant barriers contribute to this issue:

  • Security and Privacy Concerns: Many valuable data assets remain untapped due to stringent security and privacy controls. These controls prevent the use of critical data in AI applications, leading to reliance on less strategic, generalized knowledge.
  • Lack of Grounded Truth: AI models often generate answers based on statistical probabilities rather than verified truths, making it difficult to distinguish between accurate and fabricated responses.

Retrieval Augmented Generation (RAG): Enhancing AI Accuracy

To address the inaccuracy, organizations are turning to Retrieval Augmented Generation (RAG), a form of information retrieval that integrates live sources of authoritative truth into AI models, reducing the risk of hallucination. 

RAG involves teaching LLMs to retrieve knowledge from external sources during response generation, thus grounding the AI’s output in verified information.

RAG can utilize various data sources, including:

  • Relational Databases, Data Warehouses, and Data Lakes: For structured data, providing a familiar but limited data interaction model.
  • Knowledge Graphs: Representing data in a network of nodes and relationships, offering a more intuitive structure for LLMs to interpret.

What is GraphRAG?

Not all RAG methods are created equal. Knowledge Graphs are emerging as a key enabler of RAG-based GenAI. In fact, Gartner recently indicated that Knowledge Graphs are now a “Critical Enabler” with immediate impact on GenAI. “GraphRAG” refers to an approach to retrieval augmented generation in which information retrieval is based on a structured, hierarchal knowledge graph. Importantly, knowledge graphs help you connect the dots between entities using ontologies to define semantic concepts and relationships. Learn more knowledge graphs here.

In our research conducted in April, 2024, knowledge graphs significantly outperformed traditional relational databases in RAG tasks. By expressing data and metadata semantically, LLMs achieve higher accuracy with less training effort. Here’s a breakdown of their performance:

  • Centralized Relational Data: Initial zero-shot accuracy is low (~20%), improving to 80% with extensive data integration and model fine-tuning.
  • Centralized Knowledge Graphs: Zero-shot accuracy improves to 60-65%, reaching up to 95% with fine-tuning and enriched data.
  • Decentralized Knowledge Graphs: These achieve the highest accuracy, consistently hitting 90-99% by connecting data sources across various domains without physically centralizing them.

Wait, What is a Decentralized Knowledge Graph? 

A decentralized Knowledge Graph is a network of independently managed Knowledge Graphs that can be connected at query time, based on rights and permissions. 

While centralized RAG (relational or semantic) can solve most accuracy concerns, decentralized knowledge graphs solve the other major barrier: secure, real-time access to a broad range of enterprise knowledge. 

In centralized scenarios, organizations have a hard time getting their employees or GenAI applications access to potential information related to a query, particularly because: 

  • Security is too cumbersome and risky to manage for one-off business questions.
  • The amount of time, energy, and resources it takes to perform a custom ETL from across multiple sources, mask the data as needed, and then hand it over as a RAG Source is not sustainable. 

Because of this, it’s an “all or nothing” culture of data accessibility within the organization. Most of the time, LLMs are used in a primitive, non-proprietary fashion to automate a knowledge task. But imagine giving an LLM access to real-time knowledge from across the organization with granular security built-in? 

In Decentralized Knowledge Graphs, integrated access and usage policies ensure that queries can span across any possible data source and safely and securely access, link, and return decentralized data to the user.

Each Knowledge Graph could have thousands or millions of nodes (classes and instances), and can physically reside right next to the business applications (on prem, or on the Cloud in a particular geography). The data is not physically stored centrally, but rather a part of a decentralized network connected through semantic web standards. 

Based on specific questions being asked, if a node in one Graph has a relationship to nodes in any of the other Graphs, if they have the permissions to access specific nodes in the other graphs, then they could be included in the answer to the question.

Simply by nature of creating an enterprise-wide knowledge graph for RAG source data, enabled by semantic standards, ontological context and secure data policies, we concluded that decentralized knowledge graphs consistently outperformed traditional data systems in reducing hallucinations in responses.

Decentralized Knowledge Graphs are also the sustainable choice for ROI: as data volumes grow in size and complexity across distinct domains, Decentralized Graphs are well equipped to support emerging LLM business cases in stride.

Implementing Decentralized Knowledge Graphs with Fluree

Fluree’s platform is designed to facilitate the creation and management of DKGs. 

Importantly, Fluree solves two unique barriers to RAG-GenAI adoption across a broad range of enterprise knowledge: 

  • Privacy and Safety: A data-centric approach to security, in which information is protected by policies directly at the data layer. Fluree’s embedded security policies programmatically enforce data access policies, making it the only knowledge graph that can manage governance, regulation, copyright, and privacy issues dynamically on real-time data.
  • Trust: Fluree uses cryptography to secure all data in a tamper-proof audit trail. This means you can trace back every piece of data to its origin, review its history over time, and build better explainability into your systems. 

Fluree is the only platform that (1) automates the conversion of any/all data silos/lakes/systems into a consistent semantic knowledge graph, ready for RAG, (2) secures data with policy in order to protect information from unwanted exposure or use, and (3) adds digital cryptography into the data tier so that organizations can trust but verify the authenticity of the data sources used in production AI. 

Summary:

  • RAG (Retrieval Augmented Generation) is the best way to “ground” GenAI in your data. This involves introducing trusted sources of authoritative data for GenAI to pull from, thus reducing hallucinations and making responses more accurate and contextual. 
  • Knowledge Graphs are emerging as a key enabler of RAG-based GenAI. In fact, Gartner recently indicated that Knowledge Graphs are now a “Critical Enabler” with immediate impact on GenAI. 
  • Decentralized Knowledge Graphs enable on-demand access to a data plane of decentralized graphs with security built in.
  • Fluree provides a trusted decentralized knowledge graph for enterprises to build more accurate and precise GenAI applications, grounded in enterprise data. 
  • Importantly, Decentralized Knowledge Graphs solve two unique barriers to RAG-GenAI adoption across a broad range of enterprise knowledge:
  • Privacy and Safety: A data-centric approach to security, in which information is protected by policies directly at the data layer. Fluree’s embedded security policies programmatically enforce data access policies, making it the only knowledge graph that can manage governance, regulation, copyright, and privacy issues dynamically on real-time data.
  • Trust: Fluree uses cryptography to secure all data in a tamper-proof audit trail. This means you can trace back every piece of data to its origin, review its history over time, and build better explainability into your systems. 

Want to learn more? Click below to access your copy of Fluree’s whitepaper on Decentralized GraphRAG:

Read the Whitepaper