
I caught the mistake in a review pass, which is the friendliest place a supply-chain mistake can show up. I had a federation ingestion sketch that treated an MCP Registry entry as if it were already a signed safety certificate for the server code behind it. That reading was too generous. The official MCP Registry documentation is explicit that the registry authenticates namespaces and hosts metadata while the broader ecosystem still owns security scanning of server code. I had let the word official do more work than the boundary actually promised.
That mistake matters once an agent platform operates across more than one registry, more than one package type, and more than one retention window. A namespace proves a publisher controlled a naming path at publish time. It does not prove the Docker image, npm package, remote endpoint, tool description, or transitive dependency is safe for the next replay. A retained verification report helps, but only if the platform can reconstruct which metadata, package digest, provenance statement, verifier policy, and archive receipt were bound together when the tool was admitted. Blog 252 ended on that uncomfortable edge: it preserved a verification disposition for the signed-manifest acknowledgement-retention path, then forward-referenced an archival spanning set. This post closes that sub-cluster with the archive shape I wish I had drawn first.
The shape is a per-registry signed-manifest acknowledgement-retention-verification-archival spanning set. The phrase is long because the boundary is long. At the federation grain, admission is not one green check. It is a record set that can answer five separate questions later: which registry metadata did we read, which artifact digest did we verify, which attestation or provenance statement did policy accept, which decision did the verifier emit, and which immutable archive receipt proves those pieces were retained together. The archive receipt is not a decorative audit log. It is what stops a later replay from sewing today's policy result onto yesterday's package digest.
This post composes with blog 249's signed-manifest discipline, blog 250's acknowledgement step, blog 251's retention window, and blog 252's verification projection. It also corrects the practical boundary with the current MCP Registry docs: registry authentication is a necessary identity input, not the final evidence object. Sigstore's Cosign verification flow, in-toto attestations, and SLSA provenance requirements give us useful evidence primitives. They do not choose our platform's retention contract for us. The rest of this post shows how I would turn those primitives into a record an agent federation can replay without inventing trust after the fact.
The Problem: Namespace Authenticity Is Not an Archive
The official MCP Registry has a clear job. Its Registry overview describes a centralized metadata repository for publicly accessible MCP servers. Its authentication guide ties publishing authentication to names such as GitHub-backed or domain-backed namespaces. Its trust notes also say security scanning of server code is left to the broader ecosystem. Those are strong primitives for discovery and publisher identity. They are not a whole admission record for a production agent federation.
That distinction is easy to lose when tool discovery is fast. A host sees server.json, installation metadata, a repository name, and a package location. A platform team then layers package verification on top. On a good day, an admission worker checks a digest, verifies a signature or attestation, stores the policy result, and lets a tool contract reference the server. On a bad day, the archive stores a human-readable server version but drops one of the binding fields that made the verification meaningful. Six weeks later an incident review can tell that a server existed, but it cannot prove which package bytes were admitted when an agent invoked a sensitive tool.
Here is the diagram I use when I want the boundary to stay visible.

flowchart LR
R[MCP registry metadata and namespace auth] --> P[Package or endpoint resolver]
P --> D[Artifact digest binding]
D --> V[Signature and attestation verifier]
V --> Q{Policy decision}
Q -- admit --> A[Archive spanning set]
Q -- reject --> X[Quarantine record]
A --> T[Tool contract admission]
A --> E[Replay and incident evidence]
The federation-grain failure mode begins when the diagram collapses R, V, and A into one field named verified. That field can mean "publisher namespace authenticated," "Cosign verified a signature," "an in-toto statement was present," "our policy admitted the artifact," or "the archive retained all evidence." Those meanings diverge under rotation, replay, and partial failure. A registry can stay healthy while a downstream package changes. A package digest can verify while a provenance predicate is missing an expected builder identity. A policy decision can be correct at ingest time and unreproducible later if the archive omitted its policy hash.
For a federation, an archive has to preserve joins, not just facts. The archive record should join registry metadata digest, artifact digest, attestation digest, verifier policy digest, admission decision, retention deadline, and receipt identifier. That is not because every registry is hostile. It is because every replay is a second reader with less context than the first reader had. The archive either carries context forward or invites the second reader to improvise.
The Archival Spanning Set
I use five records for the spanning set. They are small enough to keep the admission path legible and separate enough to avoid one giant JSON blob whose fields mutate whenever a verifier changes.
| Record | Load-bearing fields | What later replay needs |
|---|---|---|
| Registry snapshot | registry URL, server name, metadata digest, namespace auth result | Proves what discovery data the admission worker read |
| Artifact binding | package type, resolved locator, artifact digest, retrieval timestamp | Prevents version labels from replacing byte identity |
| Evidence bundle | signature bundle digest, attestation digest, provenance predicate summary | Preserves verifier inputs |
| Policy decision | policy digest, verifier version, decision, reason codes | Explains why evidence became admission or quarantine |
| Archive receipt | spanning-set digest, retention class, receipt timestamp, receipt signature | Binds the first four records for replay |
The registry snapshot matters even when downstream marketplaces enrich the official registry. It tells the federation which metadata path led to the artifact binding. The artifact binding matters because installation syntax is not an immutable artifact. The evidence bundle matters because a signature check and an attestation check answer different questions. Cosign's verification docs show signature and attestation verification flows. In-toto defines statement and attestation structures for supply-chain claims. SLSA describes provenance claims and requirements by level. None of those documents says "store the current registry page and hope." The archive receipt is where the platform takes responsibility for the join.
flowchart TB
S[Registry snapshot] --> H[Spanning-set hash]
B[Artifact binding] --> H
E[Evidence bundle] --> H
P[Policy decision] --> H
H --> R[Signed archive receipt]
R --> K[Retention class]
R --> I[Incident replay]
R --> C[Change-control review]
The receipt can be a signed object in an append-only evidence store, a transparency-log anchored bundle, or an internal ledger receipt that a platform controls. The implementation choice depends on threat model and budget. The structural requirement is less negotiable: the receipt digest must bind the evidence set that the admission decision used. If a later retention compactor drops raw verifier logs, the receipt and the compacted evidence summary still need enough material to prove that the record set belonged together at admission time.
This is where blog 252's verification projection becomes archival. Verification records that evidence passed a policy then. Archival spanning keeps the evidence, policy, result, and retention receipt replayable together later. The words are similar. The failure domains are not.
Threat Model: What the Receipt Does and Does Not Prove
The archive receipt narrows a replay question. It does not bless an MCP server for eternity. That limit keeps the spanning set useful. If a server author loses a signing identity after admission, the old receipt still proves what the federation admitted at the older timestamp. It does not claim the signing identity remains safe now. If a tool endpoint behaves maliciously even though its package provenance looked good, the receipt preserves the admission evidence. It does not turn provenance into runtime behavior proof.
I use three threat-model lines when I review the design with a platform team. A metadata substitution attempt tries to swap discovery fields after admission. The registry snapshot digest and artifact binding make that visible. An artifact substitution attempt tries to point the same name or version at different bytes. The artifact digest and verifier evidence make that visible. A decision substitution attempt tries to apply a later policy result to an older admission. The policy digest and archive receipt make that visible.
There are also threats this record shape only hands off. Runtime prompt injection inside a legitimate tool description still needs tool-contract policy, sandboxing, and monitoring. A compromised build pipeline can emit provenance that a weak policy accepts. The spanning set will preserve that weak decision accurately; the policy review must improve the gate. Evidence archival is not absolution. It is the mechanical step that prevents a later review from debating a record the platform never kept.
A Minimal Admission Record in Code
The code below is deliberately boring. It does not implement Cosign or parse an in-toto predicate. Those jobs belong to real verifiers and structured parsers. This function sits after those verifiers and builds the archive material that keeps their result attached to the admission decision.
from dataclasses import asdict, dataclass
from hashlib import sha256
from json import dumps
from typing import Literal
Decision = Literal["admit", "quarantine", "reject"]
@dataclass(frozen=True)
class RegistrySnapshot:
registry: str
server_name: str
metadata_digest: str
namespace_auth: str
@dataclass(frozen=True)
class EvidenceBundle:
artifact_digest: str
signature_bundle_digest: str
attestation_digest: str
provenance_summary_digest: str
@dataclass(frozen=True)
class PolicyDecision:
policy_digest: str
verifier_version: str
decision: Decision
reason_codes: tuple[str, ...]
def canonical_digest(value: object) -> str:
encoded = dumps(value, sort_keys=True, separators=(",", ":")).encode()
return "sha256:" + sha256(encoded).hexdigest()
def archive_receipt(
snapshot: RegistrySnapshot,
evidence: EvidenceBundle,
decision: PolicyDecision,
retention_class: str,
) -> dict[str, object]:
if decision.decision == "admit" and not evidence.attestation_digest:
raise ValueError("admitted tool evidence must keep attestation binding")
spanning_set = {
"registry_snapshot": asdict(snapshot),
"evidence_bundle": asdict(evidence),
"policy_decision": asdict(decision),
"retention_class": retention_class,
}
return {
"spanning_set_digest": canonical_digest(spanning_set),
"decision": decision.decision,
"reason_codes": list(decision.reason_codes),
"retention_class": retention_class,
}
I keep the digest construction canonical on purpose. A replay worker should be able to compute the same spanning-set digest from structured records without depending on Python dict insertion accidents or pretty-printed whitespace. In a real pipeline, metadata_digest, artifact_digest, signature bundle digest, and attestation digest come from typed verification steps. The archive builder should reject admission if a required binding is missing rather than filling the hole with a version string.
Here is the terminal output from a small fixture that uses the function with a registry snapshot and verifier result. This is the kind of output I want in an ingestion log because it names the decision and receipt, not because a log line alone is the archive.
$ python3 archive_receipt_demo.py
decision=admit
reason_codes=['namespace-authenticated', 'artifact-digest-bound', 'attestation-policy-pass']
retention_class=security-evidence-400d
spanning_set_digest=sha256:3e0c3dbb4ed3303ed8c5b7ca6ffca0202af1f60d6948d9d41aa50b4908796920
The important thing about that output is the absence of a server version string as the primary identity. Versions are useful for humans. Digests keep a replay honest.
The Decision Flow That Keeps Quarantine Useful
A spanning set should not make every incomplete evidence bundle disappear into a generic failure bucket. Quarantine is a first-class decision. A server might have namespace authentication and a digest binding but no provenance statement that meets the policy for a privileged filesystem tool. That record is useful. It tells the platform team which evidence existed, which policy gate failed, and whether a later publisher update can fix the gap without pretending the tool was admitted.
flowchart TD
A[Resolved MCP server candidate] --> N{Namespace authentication captured?}
N -- no --> RJ[Reject discovery record]
N -- yes --> G{Artifact digest bound?}
G -- no --> Q1[Quarantine missing artifact binding]
G -- yes --> S{Signature and attestation policy pass?}
S -- no --> Q2[Quarantine evidence gap]
S -- yes --> R{Archive receipt persisted?}
R -- no --> Q3[Quarantine archive write failure]
R -- yes --> OK[Admit tool contract]
This is the comparison that guides incident reviews.

| Shortcut | Archival spanning set |
|---|---|
Stores verified: true |
Stores verifier input digests, policy digest, decision, and receipt |
| Replays a version label | Replays artifact bytes by digest |
| Treats namespace identity as safety | Treats namespace identity as one admission input |
| Loses useful partial failures | Keeps quarantined evidence with reason codes |
| Makes retention cleanup risky | Allows compaction around receipt-bound fields |
An admission pipeline should not turn a security uncertainty into a silent retry storm. Quarantine gives operations a bounded state. It also gives content moderators, incident responders, and policy authors a path to say why a tool did not cross the boundary. That is much better than a host discovering an attractive server, failing admission, and quietly switching to a second source whose evidence was never compared.
A Debugging Story: The Replayed Version That Was Not the Replayed Artifact
The gotcha that pushed me toward this record shape came from a fixture replay, not a dramatic outage. I changed a local test package behind the same semantic version while rebuilding an MCP admission example. The discovery snapshot still pointed at the same server name and version. My first replay report said the candidate matched. It matched because I had stored registry metadata and a policy result, but not the package digest that policy had evaluated.
The replay looked tidy until I printed the verifier inputs:
expected_artifact_digest = sha256:45b8...e91c
replay_artifact_digest = sha256:98de...7a40
registry_version = 0.4.0
stored_policy_result = admit
The policy result was not wrong. My archive was. It had allowed an old decision to float free of its artifact binding. The fix was not "be careful with versions." The fix was to make the artifact binding a load-bearing record in the spanning set and include its digest in the archive receipt. After that change, the replay failed early with a digest mismatch and preserved the original admission record for inspection. That is the flavor of failure I want: crisp, local, and unambiguous.
The same class of bug appears at bigger scale when evidence retention and package retention follow different clocks. A verifier bundle may be retained for a security window while a package registry garbage-collects old blobs. A metadata aggregator may refresh installation text while an incident report cites an older tool invocation. The spanning set does not magically retain every external artifact forever. It does tell the federation which external bytes and evidence it depended on, which retention class covered them, and which receipt proved the decision existed before replay asked its question.
Production Considerations
There are four production pressures worth handling before this architecture leaves a whiteboard.
First, pick retention classes before storage tiers. Security evidence for a tool that can read secrets should not inherit the same compaction schedule as discovery telemetry. A practical class might keep receipt-bound summaries longer than verbose verifier logs, but the summary must still retain the fields the replay policy needs. Do the field audit before the compactor writes its first tombstone.
Second, version verifier policy. SLSA and in-toto evidence are structured. Policy still changes. A federation might accept one builder identity for a low-risk tool and require a stricter predicate or signature identity for a privileged connector. The archive should hold the policy digest and verifier version so a later report can distinguish "would fail under today's policy" from "failed under the admission policy."
Third, separate archive write failures from evidence failures. They have different operators. Evidence failure belongs to publisher remediation or policy discussion. Archive write failure belongs to platform reliability. Both block admission in this design because a decision without retained evidence is a future blind spot, but they should produce different reason codes and alerts.
Fourth, watch the federation join cardinality. One registry candidate can resolve to multiple package transports. One package can carry multiple attestations. One tool contract can pin one artifact while another contract pins a later artifact. The archive receipt should bind the exact selected path. It should not digest a sprawling set of "all evidence we saw today" and make a later incident report search for the subset that actually admitted the tool.
An Operational Walkthrough From Discovery to Review
I split the operational path into discovery, verification, archive, admission, and review. That split sounds pedantic until an on-call engineer needs to decide which retry is safe. Discovery can retry a registry read when transport fails. Verification can retry a transparency-log or signature service query when the verifier dependency times out. Archive should retry its own write and keep the candidate quarantined while it does so. Admission should not retry around an archive failure by letting the tool through with a TODO receipt. Review should never mutate the old receipt when it wants a new policy verdict.
At discovery time, I capture metadata before I normalize it for a UI. The raw discovery fields and the normalized fields have different jobs. Raw fields help prove what a registry or marketplace adapter returned. Normalized fields help an agent platform compare candidates across transports. If only normalized fields survive, an incident reviewer can see the platform's interpretation but not the input that drove it. If only raw fields survive, every downstream policy has to reparse external shapes. The snapshot record is the deliberate join between those worlds.
Verification begins after the artifact locator resolves to bytes or to a remote identity the policy can evaluate. A local package transport should produce a digest that the archive can hold. A remote server path may need a different evidence contract, such as a pinned deployment identity, attested release record, or explicit policy statement that the class cannot be byte-pinned at admission. The spanning set is still useful there because it records the policy shape honestly. It should not invent a package digest for a remote server just to make two transport families look alike in a dashboard.
Archive is the point where evidence becomes future-facing. I prefer to compute the receipt from stable record digests and store the individual records separately. That keeps an archive query narrow when an engineer needs one policy result, while the receipt still gives replay a root digest for the whole admission packet. The archive layer should report its receipt identifier back to the admission worker. It should also report why it could not write one. A missing object-store permission, a retention-class policy denial, and an invalid digest encoding all deserve different error handling even though they all block admission.
Admission is intentionally thin once the archive exists. The tool contract references the admitted artifact or remote identity plus the archive receipt that supports the decision. The contract does not copy every attestation predicate into the hot path. That choice keeps execution latency from depending on audit verbosity and stops the execution layer from becoming a second evidence archive with less discipline. If a tool invocation later needs to show why it was allowed, it can point back to the receipt. The archive can open the receipt-bound records on demand.
Review is where a lot of otherwise sound systems damage their own history. A new security policy arrives. The team replays older candidates. A report marks one old admission as failing today's gate. That report is useful, but it should be a new review result linked to the old receipt, not an edit to the old admission decision. The old decision answers what policy admitted then. The new review answers what policy would admit now. Keeping both lets a federation learn from stronger gates without falsifying earlier operational facts.
This walkthrough also gives platform teams a clean place to add observability. Discovery emits candidate and namespace events. Verification emits policy input and verifier dependency events. Archive emits receipt persistence and retention-class events. Admission emits tool-contract linkage events. Review emits replay verdict events. The spans can share trace context while the evidence records keep stable digests. That combination lets operators debug latency in a modern trace view and still reconstruct the security decision from durable records when the trace sampling window is long gone.
Rollout Without Freezing Tool Adoption
The first rollout step is not to demand perfect provenance from every tool and stop the platform. It is to define risk classes. A local development helper that never crosses a production boundary can use a lighter archive policy than a production connector that can alter customer records. The important habit is that each class has an explicit evidence minimum and explicit quarantine behavior. A light class can say namespace snapshot plus artifact digest plus policy receipt. A privileged class can require attestation evidence and a stricter policy digest. Ambiguity is what turns rollout into exceptions.
The second step is backfill by reference, not by fiction. Existing tool contracts can be scanned for artifact locators and recent verification results. If the old archive never captured an attestation digest, the backfill record should say that the evidence is absent. It can schedule re-verification against current artifacts where that is useful. It should not stamp a new attestation onto a historical admission and present the result as though the field existed then. A backfill that records its gaps is more trustworthy than a complete-looking ledger whose oldest rows were fabricated by migration.
The third step is to put quarantine in the developer experience. A publisher or platform engineer needs reason codes, missing evidence names, and the policy class that required them. Otherwise archival discipline feels like a silent blocker and teams work around it. A quarantine record that says "artifact digest missing for resolved transport" or "archive receipt write denied for retention class" invites a fix. A generic red badge invites bypasses.
Once those three steps are in place, the federation can tighten gradually. It can compare classes, measure which evidence gaps repeat, and decide which registry adapters need better artifact binding. That is a much healthier posture than declaring every discovered server trusted or declaring every incomplete server forbidden forever. The archive gives you memory. The policy gives you judgment. They should grow together without pretending they are the same thing.
Conclusion
Blog 252 ended with verification. Blog 253 ends with replayable evidence. The federation-grain MCP server supply-chain sub-cluster needs both. MCP Registry namespace authentication helps a platform know who published metadata. Digest binding, signature and attestation verification, policy evaluation, and archive receipts help a platform know what it admitted and what it can prove later. Confusing those surfaces is comfortable during discovery and expensive during incident review.
The archival spanning set I use is simple on purpose: registry snapshot, artifact binding, evidence bundle, policy decision, and archive receipt. It preserves useful partial failures through quarantine. It makes artifact digests primary. It stops a semantic version from impersonating a replayable admission record. Most importantly, it gives the next reader a bounded packet of evidence rather than a trust story reconstructed from memory.
The next federation step is not another adjective on the archive record. It is a replay-rubric run that compares those receipt-bound records against the next policy and next incident question without rewriting history. That is where the federation can learn without laundering old evidence into new certainty.
Sources
- Model Context Protocol, "The MCP Registry," https://modelcontextprotocol.io/registry/about
- Model Context Protocol, "How to Authenticate When Publishing to the Official MCP Registry," https://modelcontextprotocol.io/registry/authentication
- Sigstore, "Verifying Signatures," https://docs.sigstore.dev/cosign/verifying/verify/
- in-toto, "Specifications," https://in-toto.io/docs/specs/
- SLSA, "SLSA Specification v1.2," https://slsa.dev/spec/latest/
- SLSA, "Provenance," https://slsa.dev/provenance/
About the Author
Toc Am
Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.
Published: 2026-05-22 · Written with AI assistance, reviewed by Toc Am.
Get These In Your Inbox
Weekly deep-dives on AI engineering, no fluff. Join the newsletter →
Or grab the book ($39, ~100 pages) · Buy me a coffee
☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter
No comments:
Post a Comment