MCP Server Supply Chain Integrity: Authorization-Bound Replay and Token-Scope Drift Composition

Introduction
I once watched an agent replay pass every artifact check and still fail the security review for the right reason. The binary had not changed. The registry metadata matched the archived receipt. The provenance bundle verified. The bug was quieter: the replayed tool was now invoked under a broader authorization scope than the one the original admission decision had assumed.
That kind of failure is annoying because every individual subsystem can look healthy. Supply-chain verification says the artifact is still the artifact. Runtime tracing says the tool call happened along the expected route. Authorization middleware says the token was valid. The uncomfortable question sits between those facts: did the original trust decision compose with the authority now being handed to the tool?
Blog 253 built the archive receipt for MCP server supply-chain evidence. Blog 254 added receipt-bound replay, so the platform could review old evidence against current policy without rewriting the old decision. Blog 255 adds the next rule: authorization-bound replay and token-scope drift composition. The core claim is simple. An MCP replay decision is incomplete if it proves artifact integrity but ignores the authority envelope used by the current application layer.
This matters because MCP is not only a package discovery problem. MCP servers are used through clients, transports, tools, resources, and authorization flows. The MCP authorization specification describes transport-level authorization for HTTP-based transports, where clients can make restricted-server requests on behalf of resource owners per the MCP authorization spec. That makes authorization scope a first-class part of the replay question, not a footnote after signature verification.
The rule in this post keeps four records separate: archived supply-chain evidence, archived authorization assumptions, current token-scope envelope, and current tool-contract impact. It emits a bounded disposition instead of a generic pass. If the artifact still verifies but the authority envelope widened, the correct answer may be re-admit rather than continue.
The Problem
Most MCP supply-chain reviews start with the artifact because artifacts are concrete. A server package has a digest. A manifest can be signed. A provenance statement can name a builder. A registry entry can be captured in an archive receipt. Those checks are necessary, and the earlier posts in this cluster intentionally spent a lot of space on them.
The problem is that agents do not execute artifacts in a vacuum. They call tools under application contracts, route decisions, user intent, and authorization grants. A read-only documentation helper and a privileged customer-record writer can point at the same server artifact but carry very different risk. If replay only asks whether the artifact remained trustworthy, it can approve the wrong operational use.
Here is the failure pattern I want to prevent:
- A tool server is admitted with a narrow scope, such as read-only access to a documentation resource.
- The server's artifact receipt is archived and later rechecked successfully.
- A new workflow routes the same tool through a broader token scope.
- The replay system says "continue" because the artifact evidence still passes.
- A human reviewer later discovers that the original decision never covered the new authority envelope.
The fifth step is the expensive one. The platform has not been hacked, necessarily. It has drifted into an unsupported trust composition. That is still a security defect because the authorization boundary changed without a fresh admission decision.
The same pattern can happen in the opposite direction. A server may lose scope, become read-only, or move behind a more restrictive policy. In that case replay should not panic just because scope changed. The disposition should depend on the direction of drift, current contract impact, retained evidence, and policy. A scope delta is not automatically good or bad. It is a fact that must be composed with the rest of the replay record.

I would not model this as one giant "agent safety" field. That field becomes impossible to audit. A better record has named inputs:
| Input | Retained field | Replay question |
|---|---|---|
| Artifact receipt | digest, signer, provenance reference | Does the original supply-chain evidence still verify? |
| Authorization assumption | scope class, resource class, delegation mode | What authority did the original decision assume? |
| Current token envelope | granted scopes, audience, expiry class | What authority does the current call carry? |
| Application contract | read/write impact, data sensitivity | What can this tool do now? |
| Replay policy | digest and rule version | Which review rule is binding? |
The table is intentionally boring. Security replay fails when boring fields are missing. If scope is only present in a prose note, it will disappear from the join when the replay worker needs it.
How the Composition Rule Works
The authorization-bound replay rule starts with the receipt-bound replay result from blog 254, then joins it with two additional projections: the archived authorization assumption and the current token-scope envelope. The archived assumption is not the entire token. It should not retain secrets. It should retain a normalized scope class, resource class, delegation mode, audience class, and policy digest. The current envelope is also normalized before comparison.
That normalization matters. Raw authorization systems have provider-specific names, tenant-specific audiences, and token formats that change over time. The replay worker should compare stable semantic classes rather than brittle strings. For example, docs.read, kb.view, and reference:read might all map to resource_read. A privileged customer write scope might map to customer_write_privileged. The mapping must be policy-owned, versioned, and visible in the review record.
The first pass evaluates evidence continuity. If the artifact receipt cannot verify, the authorization join should not rescue it. The tool is either re-admit, quarantine, or retire depending on policy and impact. If evidence passes, the rule evaluates scope drift.
The second pass evaluates the direction of token-scope drift:
| Drift direction | Example | Default disposition |
|---|---|---|
| Same scope class | read-only docs then read-only docs | Continue if evidence and policy pass |
| Narrowed scope | write-capable then read-only | Continue or re-admit, depending on policy |
| Lateral scope | docs read then ticket read | Re-admit if resource class changed |
| Widened scope | docs read then customer write | Re-admit or quarantine |
| Unattributed scope | missing archived assumption | Quarantine for privileged contracts |
That disposition table is not meant to replace local policy. It is a starting rubric. The important move is to stop treating scope drift as a note attached to artifact verification. Scope drift changes the trust composition.
The third pass evaluates application-contract impact. A widened scope that only permits a low-risk read may be re-admitted through a lightweight path. A widened scope that permits production writes, customer data access, payment actions, or expensive external calls should receive a stricter disposition. Artifact integrity does not lower that impact class.
The fourth pass writes reason codes. I would use reason codes like:
scope_class_unchanged
scope_class_widened
resource_class_changed
delegation_mode_changed
archived_scope_assumption_missing
privileged_contract_requires_re_admission
artifact_receipt_verified
artifact_receipt_unavailable
Reason codes are the difference between a useful replay program and a dashboard-shaped fog machine. They let a team see whether failures are caused by missing retained scope assumptions, product teams adding broader tool authority, or verifier evidence disappearing.
Implementation Guide
Here is a compact implementation sketch. It is not a replacement for a full authorization engine. It shows the shape of the join that a replay worker should perform after it has already loaded the archived receipt and current policy.
from dataclasses import dataclass
from enum import Enum
class ScopeDrift(str, Enum):
SAME = "same"
NARROWED = "narrowed"
LATERAL = "lateral"
WIDENED = "widened"
UNATTRIBUTED = "unattributed"
@dataclass(frozen=True)
class AuthAssumption:
scope_class: str
resource_class: str
delegation_mode: str
policy_digest: str
@dataclass(frozen=True)
class TokenEnvelope:
scope_class: str
resource_class: str
delegation_mode: str
audience_class: str
@dataclass(frozen=True)
class ContractImpact:
impact_class: str
can_write: bool
touches_sensitive_data: bool
def classify_scope_drift(old: AuthAssumption | None, new: TokenEnvelope) -> ScopeDrift:
if old is None:
return ScopeDrift.UNATTRIBUTED
if old.scope_class == new.scope_class and old.resource_class == new.resource_class:
return ScopeDrift.SAME
if old.resource_class != new.resource_class and old.scope_class == new.scope_class:
return ScopeDrift.LATERAL
order = {"read": 1, "read_write": 2, "privileged_write": 3}
old_rank = order.get(old.scope_class, 99)
new_rank = order.get(new.scope_class, 99)
if new_rank < old_rank:
return ScopeDrift.NARROWED
if new_rank > old_rank:
return ScopeDrift.WIDENED
return ScopeDrift.LATERAL
def replay_disposition(
evidence_verified: bool,
old_auth: AuthAssumption | None,
new_token: TokenEnvelope,
impact: ContractImpact,
) -> tuple[str, tuple[str, ...]]:
reasons: list[str] = []
if not evidence_verified:
reasons.append("artifact_receipt_unavailable_or_failed")
if impact.impact_class == "privileged":
return "quarantine", tuple(reasons)
return "re_admit", tuple(reasons)
reasons.append("artifact_receipt_verified")
drift = classify_scope_drift(old_auth, new_token)
reasons.append(f"scope_drift_{drift.value}")
privileged = impact.impact_class == "privileged" or impact.can_write or impact.touches_sensitive_data
if drift == ScopeDrift.UNATTRIBUTED and privileged:
reasons.append("privileged_contract_missing_archived_scope")
return "quarantine", tuple(reasons)
if drift in {ScopeDrift.WIDENED, ScopeDrift.LATERAL}:
if privileged:
reasons.append("privileged_contract_requires_re_admission")
return "re_admit", tuple(reasons)
return "continue", tuple(reasons)
The most important line is not the enum. It is the refusal to return continue when the archived authorization assumption is missing for a privileged contract. That is the security posture. Missing old scope context is not a neutral state. It is an attribution gap.
Here is the terminal fixture I use for the failure from the introduction:
case=customer-write-expanded-scope
artifact_receipt=verified
old_scope=read
old_resource=docs
new_scope=privileged_write
new_resource=customer_records
contract_impact=privileged
disposition=re_admit
reasons=artifact_receipt_verified,scope_drift_widened,privileged_contract_requires_re_admission
That output is deliberately short. It gives an incident responder enough to know that the artifact was not the problem. The new authority envelope was.
Decision Flow
The decision flow should be strict about ordering. First verify the artifact receipt. Then compare scope. Then evaluate contract impact. Then emit the disposition. If the implementation checks scope first, it may accidentally explain away a missing artifact receipt. If it checks contract impact first, it may overreact to a low-risk tool whose artifact evidence failed in a recoverable way.
There is a subtle gotcha in that flow. The widened-scope branch returns re-admit even when the tool is not sensitive. That may feel conservative, but it keeps the replay system honest. A widened authority envelope means the current use is outside the old trust composition. Low-risk use can have a lightweight re-admission path. It still deserves a fresh decision.
The same principle applies to lateral drift. Reading from a different resource class can change risk without changing the apparent permission rank. A token that moves from documentation read to ticket read may expose customer details, incident notes, or internal operational data. Lateral is not harmless just because it is not wider.
Comparison and Tradeoffs
There are three common ways teams handle this problem.
The first approach is artifact-only replay. It is simple, fast, and easy to explain. It is also incomplete for MCP tools that cross authorization boundaries. Artifact-only replay answers whether the artifact still verifies against retained evidence and current policy. It does not answer whether the current token authority is covered by the old admission decision.
The second approach is runtime-only authorization enforcement. This approach says the tool call is safe if the current token is valid and the runtime policy allows the call. It is better than ignoring authorization, but it misses the historical admission question. The token can be valid while the supply-chain admission decision is stale for that scope.
The third approach is authorization-bound replay. It keeps artifact verification, runtime authorization, and admission replay as separate layers. That separation costs more schema work. It also gives reviewers a better audit story.

| Approach | Strength | Failure mode |
|---|---|---|
| Artifact-only replay | Strong supply-chain evidence discipline | Misses token-scope expansion |
| Runtime-only auth | Enforces current access policy | Ignores historical admission assumptions |
| Authorization-bound replay | Composes evidence, authority, and impact | Requires retained normalized scope fields |
I prefer the third approach for production agents because it keeps each layer narrow. Sigstore's verification tooling focuses on signatures and attestations per Sigstore. SLSA defines supply-chain levels and recommended attestation formats including provenance per SLSA v1.2. OpenTelemetry's GenAI semantic conventions help runtime telemetry use common attributes per OpenTelemetry. None of those sources should be forced to impersonate the others. The platform composes them at the replay layer.
Production Considerations
Do not store raw access tokens in the replay archive. Store normalized authority projections and enough metadata to prove which mapping policy produced them. A projection can include scope class, resource class, audience class, delegation mode, tenant boundary, and policy digest. The exact set depends on your environment, but the principle is stable: retain what replay needs without retaining bearer secrets.
Treat the normalization policy as code. If the mapping from provider scopes to semantic scope classes changes, replay should record both the old mapping digest and the new mapping digest. Otherwise a future reviewer cannot tell whether scope drift came from the token, the resource, or the team's interpretation of provider-specific strings.
Monitor three counters from day one:
| Counter | Why it matters |
|---|---|
| Re-admits caused by widened scope | Shows product workflows expanding tool authority |
| Quarantines caused by missing archived auth assumptions | Shows archive schema gaps |
| Lateral resource-class drifts | Finds quiet movement into sensitive data classes |
Those counters should be sliced by tool family, contract impact, and owner. A single global "scope drift" percentage will hide the repair path. If most quarantines come from missing archived assumptions, improve the archive writer. If most re-admits come from one workflow owner, review the workflow's tool-contract design.
Finally, keep enforcement staged. Start with report-only results for low-impact tools. Enforce re-admission for privileged contracts first. Quarantine only when the replay system can point to a clear reason code: missing archived scope for privileged use, failed artifact evidence, or current policy that explicitly disallows the authority composition.
Debugging the Non-Obvious Failure
The bug that tends to survive the first rollout is not a failed verifier. It is a stale scope mapping. A provider renames a scope, a gateway team updates a policy bundle, or a product team splits one resource class into two. The replay worker still receives a token envelope, but the normalization policy no longer maps it to the same semantic class that the archive writer used months earlier.
That failure can look like real drift. In one fixture, resource_read became case_read after a policy cleanup. The application contract had not gained authority. The old mapping was simply coarser than the new mapping. My first implementation emitted lateral and required re-admission for hundreds of low-risk reads. The replay system was technically consistent and operationally noisy.
The repair was to version the mapping and add a migration table for semantic splits. If an old class splits into narrower new classes, replay can emit scope_class_refined instead of scope_class_lateral, as long as the new class is a subset of the old authority. That reason code still records the mapping change, but it does not punish the team for making authorization metadata more precise.
Here is the terminal output I want from that regression test:
case=resource-class-refinement
old_mapping=auth-map:2026-04-01
new_mapping=auth-map:2026-05-22
old_scope=resource_read
new_scope=case_read
subset_proof=present
contract_impact=standard
disposition=continue
reasons=artifact_receipt_verified,scope_class_refined,subset_proof_present
The subset_proof field is doing real work. Without it, a renamed scope can sneak past review as if it were narrower. With it, the replay worker has to show why the new class is contained by the old assumption. That proof can be a policy-table row, a signed mapping bundle, or an internal authorization schema version. The exact mechanism matters less than the discipline: refinement is not a synonym for trust.
The second non-obvious failure is clock-bound authority. A token may have been valid for a short-lived delegated action, while the replay archive only retained its scope class. Months later the replay worker sees the same class and misses the fact that the original decision assumed a narrow delegation window. That is why I retain an expiry class, not an expiry timestamp. The archive does not need the old bearer token. It does need to know whether the admission assumed a five-minute user delegation, a service account, or a long-lived automation credential.
I use three expiry classes in fixtures:
| Expiry class | Replay meaning |
|---|---|
interactive_short |
User-mediated action with a short review window |
service_rotated |
Service credential with normal rotation evidence |
long_lived_exception |
Exception path that should force re-admission |
This is boring, but it catches a class of incidents that otherwise become arguments. The artifact still verifies. The scope class may be the same. The delegation duration changed from interactive to long-lived. That is authority drift.
Review Result Schema
The review result should be append-only and separate from the original admission receipt. That separation is the same discipline used in blog 254. The old decision remains the old decision. The replay result records what the current review discovered under current policy, current scope mapping, and current contract impact.
A minimal review result needs these fields:
| Field | Purpose |
|---|---|
receipt_digest |
Links the review to the archived supply-chain evidence |
archived_auth_digest |
Links to the normalized authority assumption retained at admission |
scope_mapping_digest |
Names the policy-owned mapping used during replay |
current_token_envelope_digest |
Identifies the normalized current authority envelope |
contract_impact_class |
Separates low-risk reads from privileged writes |
disposition |
Emits continue, re-admit, quarantine, or retire |
reason_codes |
Explains why the disposition was chosen |
I would also include reviewed_at, review_worker_version, and policy_digest. Those fields are not glamorous, but they make a future dispute answerable. If a team asks why a tool moved from continue to re-admit between two review runs, the platform can compare the mapping digest, policy digest, and worker version before accusing the tool owner.
The review result should avoid copying raw verifier logs or raw token material. It can point to evidence bundles and normalized projections. That keeps the operational dashboard useful without turning it into a sensitive-data lake. When an incident responder needs deeper evidence, they can open the referenced receipt and policy bundles through the normal access path.
One design constraint is worth stating plainly: a replay result should never mutate the old archived assumption. If the old assumption was too thin, append a result that says so. Do not patch history to make the review pass. The whole point of replay is to preserve the difference between what the platform knew then and what it knows now.
Testing Strategy
The test suite should be built around joins, not just individual validators. Unit-test the scope classifier, of course. Also test the full replay disposition because most bugs appear when evidence state, scope drift, and contract impact interact.
I would start with eight fixtures:
| Fixture | Expected disposition |
|---|---|
| Verified artifact, same read scope, low-impact contract | Continue |
| Verified artifact, narrowed scope, standard contract | Continue |
| Verified artifact, widened scope, low-impact contract | Re-admit |
| Verified artifact, widened scope, privileged contract | Re-admit with privileged reason |
| Verified artifact, missing archived scope, privileged contract | Quarantine |
| Failed artifact evidence, low-impact contract | Re-admit |
| Failed artifact evidence, privileged contract | Quarantine |
| Scope-class refinement with subset proof | Continue |
The fixture names should include the reason code being tested. That sounds fussy until an incident review asks why a decision changed between policy versions. A reason-coded fixture lets the team see whether the code changed the disposition rule or the policy mapping changed the input.
I also like snapshot tests for the review record. A review result is part of the audit surface. If a code change removes policy_digest, scope_mapping_digest, or contract_impact, the snapshot should fail. It is easier to catch a missing field in CI than in a quarterly review when the person who changed the serializer is working on something else.
Rollout Checklist
Before enforcing authorization-bound replay, I would require five operational checks.
First, the archive writer must retain normalized authorization assumptions for new admissions. If the archive only has raw prose, start in report-only mode and mark privileged gaps clearly.
Second, the authorization team must own the mapping table from provider scopes to semantic scope classes. The table should have a digest. Replay should record that digest in every result.
Third, the application platform must classify tool contracts by impact. A replay worker cannot decide whether a scope widening is dangerous if every contract is simply "tool call."
Fourth, dashboards must show reason-code distribution, not only disposition counts. A spike in archived_scope_assumption_missing means a data-retention problem. A spike in scope_drift_widened may mean product workflows are expanding authority. Those are different repair queues.
Fifth, enforcement should begin with privileged contracts. Report-only for low-impact reads gives teams time to improve mapping and archives without blocking harmless traffic. Privileged contracts deserve less patience because the cost of approving unsupported authority is higher.
Conclusion
Artifact integrity is necessary for MCP server trust, but it is not the whole trust decision. A tool can still verify and still be unsafe for the authority envelope now attached to it. Authorization-bound replay closes that gap by joining archived evidence, archived scope assumptions, current token scope, and current application impact.
The payoff is a sharper review result. The platform can say: the artifact still verifies, the old decision assumed read-only documentation access, the current workflow grants privileged customer-record write authority, and the correct disposition is re-admit. That is much better than a green checkmark that only proves the easiest part.
Sources
- Model Context Protocol, "Authorization," https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization
- Model Context Protocol, "The MCP Registry," https://modelcontextprotocol.io/registry/about
- Sigstore, "Verifying Signatures," https://docs.sigstore.dev/cosign/verifying/verify/
- SLSA, "SLSA Specification v1.2," https://slsa.dev/spec/v1.2/
- OpenTelemetry, "Semantic conventions for generative AI," https://opentelemetry.io/docs/specs/semconv/gen-ai/
About the Author
Toc Am
Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.
Published: 2026-05-22 · Written with AI assistance, reviewed by Toc Am.
Get These In Your Inbox
Weekly deep-dives on AI engineering, no fluff. Join the newsletter →
Or grab the book ($39, ~100 pages) · Buy me a coffee
☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter
No comments:
Post a Comment