Multi-agent systems need a different governance model

The first generation of enterprise AI governance was designed for a world where agents operated independently. One agent, one task, one data source. You reviewed the agent's access, approved its policy, and moved on. That model was never perfect, but it was at least tractable.

The second generation — agentic pipelines where orchestrators delegate to sub-agents, which call tools, which trigger other agents — breaks that model completely. Not incrementally. Completely. The governance assumptions that held for a single agent do not survive the first handoff.

I've watched this play out inside large organizations building out their AI programs. The teams that hit the wall first are usually the ones who were most disciplined about governing their early agents. They built careful per-agent controls. Then they connected those agents and discovered that their governance model had a fundamental architectural flaw: it governed nodes, not flows.

What changes at the handoff

Consider a fraud investigation pipeline. An orchestrator agent receives an alert and begins routing work. It calls a transaction analyst agent to examine payment patterns. That agent surfaces a suspicious account and hands a case identifier to an account profiler. The profiler escalates to a KYC specialist, who requests additional verification. A SAR generator then drafts a regulatory filing from the compiled outputs.

Fraud investigation — 5-agent pipeline

orchestrator alert routing

↓

transaction_analyst payment patterns

→

account_profiler risk signals

→

kyc_specialist identity verification

→

sar_generator regulatory filing

Five agents. Each with a different role, different data access requirements, and different regulatory exposure. Each has been reviewed individually. But none of that individual review addresses the most important governance questions in this pipeline.

What did the transaction analyst retrieve, and did that data flow to the SAR generator? When the account profiler passed a case to the KYC specialist, was the handoff authorized under the policies governing both roles? If a regulator asks you to reconstruct exactly what data entered the SAR, which agent's audit log do you look at?

These are not edge cases. They are the normal operating questions of any agentic pipeline running at scale in a regulated environment. And they are unanswerable if your governance model is per-agent rather than per-flow.

The inherited context problem

The deepest issue in multi-agent governance is one that doesn't get enough attention: inherited context.

When Agent B receives output from Agent A, that output may contain data that Agent A retrieved from a source Agent B would not be permitted to access directly. Agent B is now operating on data it never formally requested, against a source it was never granted access to, in a way that no per-agent policy review would have caught.

The inherited context gap

Per-agent governance controls what each agent can retrieve. It does not control what each agent can receive. In a multi-agent pipeline, those are two completely different things — and the gap between them is where your most significant compliance exposures live.

This is not hypothetical. In financial services, a transaction analyst agent may legitimately access behavioral pattern data. A SAR generator agent should not — SAR drafting is a distinct regulatory function with its own access boundaries. If the transaction analyst passes its full retrieved context to the SAR generator rather than a structured summary, the SAR generator is now operating on data it has no business touching. The per-agent policies were both compliant. The pipeline was not.

The same dynamic appears in healthcare, where clinical data retrieved by a triage agent cannot freely flow to a billing agent. In pharma, where trial data accessed by a research agent has no business reaching a commercial agent. In any regulated environment where data access boundaries exist not just at the source level but at the role level.

Why session-level governance is the right model

The shift that multi-agent systems require is from agent-level governance to session-level governance. Instead of asking "what is this agent allowed to retrieve?" you ask "what is this pipeline, across all its agents, allowed to access — and does the audit trail reflect that entire session coherently?"

This reframing has three concrete implications.

First, the session is the unit of audit. When a regulator asks what data was accessed during a fraud investigation, the answer should come from a single session record, not from correlating five separate agent logs with inconsistent schemas and timestamps. Every agent in the pipeline operates under a shared session ID. Every retrieval decision — allow or deny — is recorded against that session. The audit trail is the pipeline, not a collection of individual agent traces.

Second, role boundaries enforce at the retrieval layer regardless of how context arrived. It does not matter whether Agent B's query came from user input, orchestrator instructions, or another agent's output. The enforcement layer evaluates the request against Agent B's policy the same way every time. If Agent B's role does not permit access to a data source, that access is denied — regardless of whether the request arrived through a sanctioned channel.

Third, policy is pipeline-aware, not just agent-aware. The policies governing individual agents should be designed with knowledge of the pipeline they operate in. A KYC specialist operating as the fourth step in a fraud pipeline has different access requirements than a KYC specialist operating as a standalone identity verification service. The policy model needs to express that distinction — and the enforcement layer needs to honor it.

What this looks like in practice

A well-governed multi-agent pipeline has a few observable properties that distinguish it from one that has been governed at the agent level.

Every agent in the pipeline shares a session identifier. That session ID travels with every retrieval request, every policy evaluation, and every audit record. If you want to reconstruct the full data access story for a single pipeline execution, you query one session and you get the complete picture: which agent made each request, what policy applied, what was allowed, what was denied, and in what order.

# All agents in a pipeline share a session — the audit trail is unified

session_id = "inv_2026_04_abc123"

# Orchestrator starts the session
result_1 = guard.evaluate(
    agent_role="fraud_orchestrator",
    source_id="alert_queue",
    session_id=session_id,
)

# Sub-agents inherit the same session
result_2 = guard.evaluate(
    agent_role="transaction_analyst",
    source_id="transaction_history",
    session_id=session_id,   # same session
)

result_3 = guard.evaluate(
    agent_role="sar_generator",
    source_id="pii_vault",      # not permitted for this role
    session_id=session_id,
)
# → DENY  (sar_generator cannot access pii_vault)
# → logged against session inv_2026_04_abc123

The deny is not just a runtime block — it is a session-level record. The audit trail shows that at this point in the pipeline, a SAR generator attempted to access PII it was not authorized to retrieve, and the enforcement layer stopped it. That record exists independent of what any individual agent logs.

The audit trail is the pipeline

One thing that consistently surprises organizations when they move from per-agent to session-level governance is how much the audit trail itself changes in value.

Per-agent audit logs are useful for debugging individual agent behavior. They are nearly useless for reconstructing what a multi-agent pipeline actually did. The timestamps don't align. The session boundaries are implicit. The relationship between one agent's output and another's input is not recorded anywhere.

A session-level audit trail is a different artifact entirely. It is a complete record of a pipeline execution — who touched what, in what order, under what policy, with what outcome. For a regulated organization, that record is not just useful for debugging. It is the evidence that your AI program is operating within its governance boundaries. It is what you hand to a regulator. It is what your internal audit team can actually work with.

The governance question that matters

The question a regulator asks is not "what was this agent allowed to access?" It is "what did this investigation touch, and can you prove that every access was authorized?" A session-level audit trail answers that question. A collection of per-agent logs does not.

AutoPIL's audit chain is designed around this reality. Every evaluation — allow or deny — is recorded with a cryptographic hash that links it to the previous record in the session. The chain cannot be selectively modified. If a record exists, the full context of why that decision was made is preserved: the agent role, the data source requested, the sensitivity level, the policy that applied, and the session it belongs to. Across every agent in the pipeline, the chain is continuous.

Building for the pipeline, not just the agent

The practical implication for teams building agentic pipelines today is straightforward: design your governance model at the pipeline level from the start, not the agent level.

That means defining a session boundary for each pipeline and ensuring every agent in that pipeline operates under a shared session ID. It means reviewing agent policies not just for what each agent does in isolation, but for what each agent might receive from its predecessors and pass to its successors. And it means choosing an enforcement layer that produces a unified audit trail across the session rather than fragmented logs across individual agents.

The teams I've seen get this right are the ones who treated multi-agent governance as an architecture question from day one — not something to layer on after the pipeline is already running. The teams who didn't are the ones who discovered the problem during an audit, when reconstructing a session from scattered logs under time pressure is exactly the wrong moment to find out your governance model has a structural gap.

The good news is that the right model is not significantly harder to build than the wrong one. The session ID is a string. The policy is external. The audit trail writes itself. The architecture just has to be intentional about where the boundary is and what travels across it.

Anil Solleti is a Managing Director and Head of Data & AI with over 25 years in financial services, leading global data strategy, AI governance, and agentic AI adoption at scale. AutoPIL was built to solve the governance problems that regulated enterprises face when deploying AI agents in production.