We use cookies to operate this site, measure performance, and improve your experience. See our Privacy Policy or manage your privacy choices.

    Enterprise AIMay 26, 202617 min read

    The Six Levels of the Autonomous Enterprise

    B

    Brian Platz

    Fluree

    Share
    Featured image for "The Six Levels of the Autonomous Enterprise"

    A framework for understanding where your organization actually sits on the AI maturity curve — and what it will take to move up.

    Why we need a new map

    Every enterprise leader is now asked some version of the same question: "How far along are we with AI?" Most can't answer it, because the question is poorly posed. "AI adoption" has come to mean everything from an employee pasting a meeting transcript into ChatGPT to a production system that closes support tickets without human review. Those aren't the same thing, and treating them as points on a single scale has produced the central paradox of enterprise AI in 2025 and 2026: record investment, record activity, record disappointment.

    MIT's NANDA initiative put a number on it. In The GenAI Divide: State of AI in Business 2025, researchers examined 300 public deployments, surveyed hundreds of leaders, and concluded that 95% of enterprise AI pilots delivered no measurable P&L impact. Gartner projects that more than 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Meanwhile, a "shadow AI economy" has emerged — over 90% of workers report using personal AI tools for job tasks, while only 40% of their employers have sanctioned subscriptions.

    The conventional diagnosis is that the models aren't good enough, or that governance will catch up, or that "agentic" is still too new. The data doesn't support any of those explanations. The models have improved dramatically across every benchmark. Governance frameworks exist. Agentic architectures are deployed in production at companies that aren't struggling.

    The real divide is something simpler and more structural: most organizations are trying to run higher-autonomy AI on lower-autonomy infrastructure. They're asking agents to act on data the agents cannot actually understand, secure, or trust.

    The framework that follows is designed to make that mismatch visible. It describes six levels of autonomous enterprise capability, each defined not by what the AI can do in theory, but by what the organization's data, knowledge, and governance substrate can actually support. The punchline, which will be familiar to anyone who has watched an AI pilot collapse in QA, is that the ceiling on your AI program is almost never your AI. It's the substrate underneath it.


    How to read the levels

    The framework borrows a structural idea from the SAE's self-driving taxonomy — graduated autonomy, clear handoffs — but inverts what gets measured. In the SAE model, the levels describe what the car can do. In this model, the levels describe what the enterprise can support. A Level 5 agent running on Level 2 infrastructure is a liability, not a capability, which is why so many of them get quietly shelved three months after launch.

    Each level is characterized across five dimensions that together determine whether autonomy is durable or theatrical:

    • Human role — what humans actually do in the workflow
    • Data & knowledge substrate — what the AI retrieves from and acts on
    • Agent capability — the scope of action the organization can safely extend
    • Governance posture — where policy lives and how it's enforced
    • ROI profile — whether AI is producing measurable business outcomes or masked costs

    Inside each level, you'll also find an estimate of how many organizations currently sit there, the dominant use cases, the blockers to advancing, and the most common failure mode at that stage.


    Level 1 — Shadow

    Where most organizations started, and where roughly 25–30% still effectively sit.

    At Level 1, there is no organizational AI strategy. The technology is present — it's just unmanaged. Employees paste proprietary content into consumer chatbots, engineers use coding assistants their security team hasn't reviewed, and marketing teams draft copy in tools nobody has approved. The CIO either doesn't know it's happening or has decided not to ask.

    DimensionLevel 1
    Human roleIndividual employees make ad-hoc tool choices
    SubstrateNone; AI sees only what an employee pastes in
    Agent capabilitySingle-turn completions, personal productivity only
    GovernancePolicy-by-email, unenforced
    ROI profileMostly negative when accounting for risk exposure

    Typical use cases: Drafting emails, summarizing meetings, writing first-pass code, rephrasing slide content.

    Primary blocker to Level 2: Lack of executive accountability. The organization has no AI owner, no budget line, and no mandate.

    Failure mode: Data leakage. A sales leader pastes a customer list into a consumer chatbot and creates a regulatory incident that nobody can trace because nobody knew it happened.

    A note on the shadow economy: MIT's research suggests the shadow AI economy isn't purely a risk — it's also a tell. Employees using consumer tools are demonstrating latent demand that official programs have failed to meet. The question isn't whether to suppress it but whether to channel it. Organizations that stay at Level 1 aren't the ones whose employees don't want AI. They're the ones whose leadership hasn't yet decided to invest in it.


    Level 2 — Assisted

    Roughly 40–45% of organizations. The most crowded level, and the one most mistaken for progress.

    Level 2 is where enterprise AI adoption becomes visible. The organization has purchased licenses — Copilot, ChatGPT Enterprise, Gemini for Workspace, maybe a vertical copilot like a sales assistant. Employees have sanctioned tools. IT has a rough acceptable-use policy. Leadership cites "AI adoption" in earnings calls and sends a quarterly internal memo celebrating usage metrics.

    What has actually happened is that productivity tools got better. The AI is augmenting individual tasks — drafting, summarizing, searching, coding — but it doesn't know anything about the business. It can rephrase your email; it cannot tell you whether the customer you're emailing is past due. This is the distinction that matters. At Level 2, the AI sees the world through whatever the user puts in the prompt window. The enterprise's data, relationships, and institutional knowledge remain walled off.

    DimensionLevel 2
    Human roleOperator — human does the work, AI assists the step
    SubstrateNone at the enterprise level; prompts carry context
    Agent capabilitySingle-task assistance inside one application
    GovernanceApplication-layer (tenant isolation, DLP on prompts)
    ROI profileProductivity lift, rarely P&L-visible at the enterprise scale

    Typical use cases: Email drafting, code completion, document summarization, meeting transcription, template generation.

    Primary blocker to Level 3: The AI has no memory of the business. Every prompt starts from zero. Users compensate by pasting in more context, which hits context limits, creates security exposure, and generates inconsistent answers across the team.

    Failure mode: Productivity theater. Usage is high, satisfaction surveys are positive, and after eighteen months leadership cannot identify a single line item on the P&L that has changed. This is the heart of the MIT GenAI Divide — wide adoption, shallow impact. Budgets get cut, the program gets re-scoped, and the organization spends another year at Level 2 wondering what went wrong.


    Level 3 — Integrated

    Roughly 20% of organizations. The first stage where enterprise-specific AI actually lives in the stack — and where most serious failures occur.

    At Level 3, the organization has built or bought AI applications that connect to internal data. This is usually the first RAG deployment: a chatbot that retrieves from SharePoint, a support agent that reads the knowledge base, a research assistant that pulls from a document store. A vector database has entered the architecture. Someone has given a talk about embeddings at an all-hands.

    For the first time, the AI knows something about the business. It also, for the first time, has the opportunity to be confidently wrong about the business. Because the retrieval layer is built on unstructured chunks without semantic structure, the model is forced to infer relationships it cannot verify. Accuracy in well-scoped use cases reaches the 70–80% range — impressive in a demo, intolerable in production for anything involving money, law, health, or regulatory exposure.

    DimensionLevel 3
    Human roleReviewer — AI proposes, human verifies before action
    SubstrateIndexed documents, vector embeddings, chunked text
    Agent capabilityRetrieval-grounded answers within a single domain
    GovernancePrompt-layer filters and output guardrails
    ROI profileFirst measurable savings in narrow domains; often offset by review overhead

    Typical use cases: Internal knowledge-base search, customer support assist, contract summarization, first-pass research reports, HR policy Q&A.

    Primary blocker to Level 4: Accuracy ceiling. Vector-only retrieval loses the structure and meaning of enterprise data the moment it hits a vector store. Complex questions — anything that requires joining facts across domains or reasoning over relationships — produce plausible-sounding answers that are subtly wrong. The organization's own research showed that semantic knowledge graphs lift accuracy from roughly 80% to 95%+ for exactly this reason: the meaning is encoded in the data, not left for the model to guess.

    Failure mode: The "two good answers, then a hallucination" pattern. Pilots succeed in curated demos and fail when users go off-script. Teams respond by narrowing scope until the use case is too small to matter, or by adding human review until the promised savings disappear. This is the stage at which most AI budgets get cut.


    Level 4 — Contextual

    Roughly 8–10% of organizations. The point at which AI begins to deliver compounding value.

    Level 4 is where an organization stops treating data as something to retrieve chunks from and starts treating it as a unified, queryable knowledge asset. The infrastructure shift is real and observable: a semantic layer appears in the architecture. Ontologies replace folder structures. A knowledge graph unifies customer, product, transaction, and operational data so that the AI doesn't have to infer that "ACME Corp" in Salesforce is the same entity as "Acme Corporation" in ERP. The facts are connected, and the connections are explicit.

    At this level, GraphRAG and semantic retrieval replace vector-only RAG. The AI receives not chunks of text but pre-connected context — the customer, their account, their open cases, their contracts, their policy eligibility — and its answers become auditable because every fact has a source. Cross-silo queries that used to require a BI team and a ticket now resolve in a single pass.

    DimensionLevel 4
    Human roleSupervisor — AI executes within its domain; human handles exceptions
    SubstrateSemantic knowledge graph, ontology, unified entity model
    Agent capabilityGrounded cross-silo reasoning; read-heavy actions
    GovernanceData-layer access controls; queries enforce policy automatically
    ROI profileProcess-level savings; first genuine P&L visibility

    Typical use cases: Conversational analytics for non-technical users, complex customer-360 interactions, regulated-industry research copilots, portfolio-level risk queries, partner- or supplier-360 insights.

    Primary blocker to Level 5: Write capability and cross-domain action. At Level 4, the AI is trusted to read widely but only permitted to write narrowly. Extending it requires a governance model where policy travels with the data, not with the application — so that the same rules apply whether a human user or an autonomous agent is making the request. Organizations that try to extend agent authority without this substrate end up rolling back their own rollouts after the first audit finding or compliance incident.

    Failure mode: The "read-only plateau." The AI is trusted, accurate, and widely used — for answering questions. The moment leadership asks it to take action, the legal and security teams freeze the program because the data-layer controls aren't there to make that action safe. Progress stalls, often for a year or more.


    Level 5 — Autonomous

    Roughly 3–5% of organizations. Production agentic AI with durable ROI.

    Level 5 is the first level where the term "autonomous enterprise" is honest. Agents at this stage don't just retrieve and recommend — they act. They close tickets, update records, initiate workflows, reconcile transactions, and coordinate across domains. The reason they can do this safely is that the data itself enforces policy. When an agent issues a query or a write, the substrate checks the agent's role, the data's classification, the policy in effect, and the provenance requirements before any action completes. Security isn't a filter in front of the model; it's a property of the data.

    This is also the level at which AI begins to cross domains. A single agent — or, more often, an AI agent mesh — may handle a customer inquiry that touches sales, contracts, shipping, and billing, not because the agent has integrations into four SaaS tools, but because the underlying knowledge is already connected. The enterprise has stopped trying to teach the agent its business one API call at a time and started giving the agent a model of the business to operate on.

    DimensionLevel 5
    Human roleStrategist — humans define goals, policies, and exceptions
    SubstrateGoverned knowledge graph with read/write semantics and provenance
    Agent capabilityBounded cross-domain action with embedded policy enforcement
    GovernanceData-layer policy; provenance and audit native to every transaction
    ROI profileRevenue-influencing; automation of full processes, not tasks

    Typical use cases: Autonomous customer service resolution end-to-end, automated reconciliation and exception handling, supply-chain agents that re-route and re-price, underwriting agents that gather, score, and issue, regulatory reporting agents that draft, verify, and file.

    Primary blocker to Level 6: Organizational readiness, not technology. At Level 6, humans step back from individual decisions entirely and govern by policy — and most enterprises don't yet have the muscle for that kind of oversight. The technology gap closes faster than the organizational gap.

    Failure mode: Runaway cost and trust-loss incidents. Organizations that reach Level 5 without mature governance see agents make plausible but policy-violating decisions — approving the wrong refund, sending the wrong disclosure, escalating the wrong incident. The fix isn't less autonomy; it's tighter policy at the data layer. The organizations that survive this stage harden their substrate. The ones that don't retreat to Level 3 and call it "responsible AI."


    Level 6 — Self-Governing

    A fraction of a percent today. A plausible state for leaders by 2030.

    Level 6 describes an enterprise in which the data, the policies, and the agents form a single operating system. Humans don't manage workflows; they set the rules the system operates by. Agents coordinate with each other across functions, negotiate internally, and escalate only when policy demands it. The knowledge graph becomes, in effect, the corporate memory — evolving with the business, surviving turnover, and exposing a single consistent truth to every system and every agent that queries it.

    This isn't science fiction, but it isn't 2026 either. What's real today is that a small number of organizations are building the architecture that makes Level 6 reachable: unified semantic substrate, policy-at-data governance, provenance-native transactions, and agent ecosystems that share a common model of the business. The companies that will operate at Level 6 in 2030 are the ones building that architecture now. The companies that won't are still arguing about which vector database to standardize on.

    DimensionLevel 6
    Human rolePolicy-setter — humans govern the system, not the cases
    SubstrateLiving ontology; the data model is the business model
    Agent capabilityMulti-agent coordination across the enterprise and beyond it
    GovernanceProvenance-native, continuously verifiable, auditable by design
    ROI profileBusiness-model-altering; AI becomes a factor of production

    What this framework is really arguing

    Most published maturity models treat AI adoption as a function of model sophistication, use-case breadth, or organizational readiness. Those lenses aren't wrong, but they're not load-bearing. The evidence from the 95% of pilots that stall is consistent and clear: the ceiling on enterprise AI is the data substrate, not the model.

    This has three implications that should shape where the next two years of AI spending go.

    First, stop evaluating agents and start evaluating substrates. The question "how good is this agent?" almost always gets answered in a curated demo. The question "what level of autonomy can our data currently support?" forces an honest answer. An organization at Level 3 can buy the best Level 5 agent in the market and will still be stuck at Level 3, because the agent's answers will be grounded in infrastructure that can't support the answers.

    Second, the jump from Level 3 to Level 4 is the one that matters most. That's where accuracy breaks through the 80% ceiling, where cross-silo reasoning becomes possible, and where ROI stops being a productivity story and starts being a business story. It's also the hardest jump, because it requires investing in semantic structure that isn't visible in a demo. The organizations that have made it are the ones whose AI programs are compounding. The ones that haven't are the ones whose AI programs are being reviewed by the CFO.

    Third, the question for every AI leader in 2026 is structural, not technical: what level is our substrate at, and what would it take to move it up one? That question has a concrete answer in every organization, it has a budget line attached, and it correlates directly with whether the next round of AI investment produces measurable outcomes or another round of stalled pilots.

    AI isn't going to transform the enterprise on its current trajectory. The enterprise has to meet it halfway — by building the connected, governed, semantically rich substrate that makes autonomy durable. Every level of this framework is reachable. None of them are reachable without the substrate underneath.


    Frequently asked questions

    Frequently Asked Questions

    If you'd like to talk through where your organization sits and what it would take to move up, get in touch.

    Enterprise AIAgentic AIKnowledge GraphsSemantic LayerAI MaturityGraphRAG
    Share
    Published May 26, 2026

    Stay in the loop

    Weekly insights on enterprise AI, knowledge graphs, and data intelligence.