Inside FlureeDB: The Features That Make a Knowledge Graph Verifiable

When we talk about FlureeDB, the one-liner we keep coming back to is: most databases store records — FlureeDB stores knowledge, and the proof of where that knowledge came from.
That's the pitch. But a pitch isn't an architecture, and if you're the person who has to actually run this thing, you want to know what's underneath. So this post is the opposite of a press release. It's a walk through the features that make FlureeDB different — what each one does, why it exists, and what it replaces.
The thread connecting all of them is a single design decision: build the hard guarantees into the engine instead of bolting them on around it. A typical "trustworthy graph" setup today is a triple store plus a search index plus a vector database plus a policy engine plus an audit log plus an agent gateway — five or six moving parts, each with its own failure mode, glued together with code that someone has to keep in sync. FlureeDB collapses that into one binary built on W3C standards. Here's what lives inside it.

The trust layer
This is the half of FlureeDB that most graph databases simply don't have. Provenance, governance, and history aren't features you turn on — they're properties of how data is stored.
Immutability: "Every transaction signed. Nothing silently altered."
FlureeDB is, at its core, an immutable ledger. Writes are non-destructive: an update doesn't overwrite a value, it appends a new commit. The old state is still there, addressable forever. Each commit is content-addressed and chained to the one before it, so the ledger is the audit log — there's no separate audit table to maintain, and no way to quietly edit history without breaking the chain.
This is the foundation everything else in the trust layer is built on. Time travel, provenance, and branching all fall out of the fact that nothing is ever destroyed.
Provenance: "Every record knows where it came from."
Immutability tells you what changed. Provenance tells you who said so — and lets anyone verify it without trusting you.
Every transaction in FlureeDB can be cryptographically signed, using industry standards rather than a proprietary scheme: JWS (JSON Web Signatures, ed25519), W3C Verifiable Credentials, and DIDs. Because commits are content-addressed in a tamper-evident chain, any modification to signed data invalidates the signature. The four properties you get from this are exactly the ones auditors ask for:
- Authenticity — who created the transaction
- Integrity — that the data hasn't been altered
- Non-repudiation — the signer can't deny it
- Provenance — the origin and full history of every fact
The important part: anyone can verify a signature without special access to the database. That's what makes trustless data exchange between organizations possible — and it's why "an AI answer without a trace is a legal liability" stops being a problem you mitigate and becomes one you've designed out. Lineage is queryable, not reconstructed after the fact.
Security: "Policies live with the data."
Most databases enforce access at the row, table, or schema level. That's too coarse for a graph, where a single subject can carry facts at wildly different sensitivity levels. So FlureeDB enforces access control per triple — ?subject ?predicate ?object — inside the query engine itself.
A single person record can expose schema:name to everyone, ex:department to employees, and ex:salary only to managers in that department. The same query returns different results to different identities, automatically:
{
"@id": "ex:salary-restriction",
"@type": "f:AccessPolicy",
"f:required": true,
"f:onProperty": [{"@id": "ex:salary"}],
"f:action": [{"@id": "f:view"}]
}
FlureeDB supports attribute-based (ABAC), role-based (RBAC), and relationship-based access control, and the consequences are worth stating plainly:
- The application never filters. Security can't be bypassed by a buggy code path, because the engine never returns flakes the requester isn't allowed to see. An AI agent is, to the database, just another caller — point one at FlureeDB and it physically cannot read what its identity can't see.
- Policies are data. They're RDF, versioned in the ledger, time-travelable, and queryable —
SELECT ?p WHERE { ?p a f:AccessPolicy }. "Who could see what, and when?" is a query. - "Minimum necessary access" is the default, not a check the app forgot to write — which is most of the way to GDPR and HIPAA compliance by construction.
Time travel: "Query the graph at any moment in its history."
Because nothing is destroyed, time is a built-in axis of the database. Every transaction gets a monotonically increasing logical timestamp t, and you can query any past state by transaction number, ISO-8601 timestamp, or content-addressed commit hash — with no snapshots, no replay infrastructure, and effectively zero overhead.
# What did the data look like as of transaction 1?
fluree query --at 1 'SELECT ?title ?genre WHERE {
?m <http://schema.org/name> ?title .
OPTIONAL { ?m <http://schema.org/genre> ?genre }
}'
For a regulated decision, "what did we know at the moment we decided?" collapses from a forensic project into a single query.
Git-like branching: "Fork, rebase, merge — for data."
This is the feature people don't expect a database to have. FlureeDB lets you fork a ledger into independent branches, each with its own content-addressed commit history — git branch, but for data.
fluree branch create try-new-schema # cheap: a new commit pointer, not a copy
fluree use mydb:try-new-schema
# ...make risky changes, test them in isolation...
fluree branch merge try-new-schema # fast-forward merge when proven safe
Branches are isolated (changes on one are invisible to others), cheap (a branch is a pointer, not a data copy), and durable (the source branch keeps taking transactions after a merge). Ledgers are addressed as ledger-name:branch, and branch names can be hierarchical (tenant/app:feature-x).
Why it matters in the agentic era specifically: when you let an AI agent propose changes to your data, you don't want those edits hitting production directly. Branch, let the agent work in isolation, validate, then merge. High-stakes changes — schema migrations, bulk corrections, agent-authored writes — never touch the production graph until they're proven safe.
The intelligence layer
The trust layer makes FlureeDB defensible. This layer makes it useful — the part that combines scattered data and reasons over it.
Data federation: "One query across every source you already have."
Fluree's most underrated feature is graph sources: anything addressable by a graph name in a query. That includes your ledger's RDF triples, but also external systems — Apache Iceberg and Parquet tables, relational databases mapped via R2RML, and remote SPARQL endpoints.
The point is that you query all of them through the same SPARQL or JSON-LD interface, and the application doesn't need to know whether a given fact came from the native graph or a data lake. FlureeDB brings the query to the data instead of forcing you to copy everything into yet another silo first. One query, every source you already have.
Integrated search: "Keyword and vector search, inside the engine."
Search isn't a separate system you sync to — it's a graph pattern. FlureeDB ships BM25 full-text search (with Block-Max WAND) and HNSW approximate-nearest-neighbor vector search directly in the engine. Annotate a string with @fulltext, or embed a vector, and search results participate in joins, filters, and aggregations like any other pattern:
{
"from": "products:main",
"select": ["?product", "?score"],
"where": [{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 10,
"f:searchResult": {"f:resultId": "?product", "f:resultScore": "?score"}
}]
}
That's one less system to operate (no Elasticsearch, no standalone vector DB to keep in sync) — and, crucially for GraphRAG, your semantic search results are governed by the same per-triple policies as everything else. Retrieval that respects access control out of the box.
Reasoning & inference: "Infer what nobody thought to record."
In a plain triple store, every fact has to be stated explicitly. Assert that Alice is a Student and Student is a subclass of Person, and a query for Person won't return Alice — unless you also stored the redundant fact. FlureeDB's built-in reasoner infers it for you, with four profiles you can mix:
| Mode | What it does | Cost |
|---|---|---|
| RDFS | Subclass / subproperty hierarchies | Query rewriting only |
| OWL 2 QL | Adds inverseOf, domain/range inference | Query rewriting only |
| OWL 2 RL | Transitive, symmetric, functional properties, property chains, sameAs, and more | Forward-chaining materialization, cached |
| Datalog | Your own if/then rules, run to fixpoint | Depends on the rules |
This keeps your stored data clean while giving queries the full power of schema-aware retrieval — and the reasoning runs inside the query engine, not in a separate pipeline you have to orchestrate.
Standards compliance: "Built on W3C standards, all the way down."
This one's a feature precisely because it isn't a lock-in. FlureeDB is full SPARQL 1.1, plus JSON-LD, Turtle, TriG, N-Triples, and N-Quads. Your data is a portable W3C RDF graph, your queries are a W3C standard, and the two query languages — SPARQL and JSON-LD Query — compile to the same engine with identical features and performance. Teams that live in JSON never have to touch SPARQL; teams that have a decade of SPARQL never have to leave it.
The agent & developer layer
The last group is about who — and what — consumes the database.
Built for agents: MCP, Memory, and Agent JSON
FlureeDB ships a Model Context Protocol server in the same binary, exposing semantic recall, graph query, and persistent memory as tools that assistants like Claude Code and Cursor can call directly — no adapter to write or maintain. Our companion product Fluree Memory gives coding agents a queryable, versioned long-term memory on the same engine, governed by the same policies, so an agent's memory inherits your access control for free. And Agent JSON is an LLM-optimized output format for feeding query results straight to a model.
Deployment: "Binary, server, or library — same engine."
The same FlureeDB runs three ways, with no feature fork between them:
- a CLI binary for local work and scripting,
- an HTTP server with OIDC auth and OpenTelemetry, and
- an embedded Rust library you can compile straight into your application.
Underneath, storage is pluggable — filesystem, S3, DynamoDB, or IPFS — and the storage-agnostic, content-addressed commit model is what makes the distributed clone / push / pull workflow possible in the first place. Run it on your laptop, in your cluster, fully serverless at flur.ee/solo, or in our hosted cloud — same engine every time.
And it's fast — provably
A reasonable reaction to all of the above is: surely it pays for it in performance. It doesn't, and we ran a public benchmark so you don't have to take my word for it.
In the SPARQLoscope DBLP evaluation — 105 real-world SPARQL queries against ~561 million triples — FlureeDB ranked #1 overall with a 19.4 ms geometric mean, ahead of QLever (10.4×), Virtuoso (15.4×), Jena (3,487×), and Blazegraph (17,158×). It was one of only two engines to complete all 105 queries. The twist: every engine that comes close is read-only — FlureeDB posts these numbers while remaining fully transactional, read and write. At 8.19 billion triples (Wikidata-truthy) it's still 10.5× faster than QLever, and it sustains over 2M triples/second on bulk import. The full methodology is public — run it yourself.
Already in production
None of this is theoretical. FlureeDB and the broader Fluree platform run in production at the U.S. Department of Defense, Morgan Stanley, The Associated Press, Dow Jones, Warner Bros. Discovery, Dotdash Meredith, WebMD Ignite, CBC, and Arizona State University. Fluree was also named a Gartner Cool Vendor in Data and AI Management.
Try it
FlureeDB is open source under BSL 1.1 (converting to Apache 2.0), with the full source at github.com/fluree/db. Install in under a minute:
brew install fluree/tap/fluree
fluree init && fluree create mydb
Or skip infrastructure entirely with the free serverless edition at flur.ee/solo. Coming from GraphDB, Jena, Stardog, Neptune, Virtuoso, Neo4j, or SQL? There's a migration guide for each — and you can request a guided proof of concept.
We built FlureeDB on a bet that the next decade of enterprise software will be judged on one question: can you prove it? Every feature above is our answer.
FAQ
Frequently Asked Questions
Stay in the loop
Weekly insights on enterprise AI, knowledge graphs, and data intelligence.
