Attestation-Aware Retrospectives: Wiring the Manifest Ledger Into the Quarterly Cross-Incident Review

Introduction
The first attestation-aware retrospective I sat through was a Friday afternoon in late April, and the room was visibly confused for the first twenty minutes. The quarterly cross-incident retrospective format we had been running for a year, the one I described in the postmortem retrospective post, had stabilised into a comfortable rhythm: pull the quarter's postmortems off the wall, group them by contributing factor, count the recurring tags, surface the top three for architecture commitments, log the rest into a carry-forward register, close the meeting in ninety minutes. Friday's meeting did not stabilise. The quarter had produced eight postmortems and twenty-one attestation events, and nobody on the team, including me, had decided in advance whether the attestation events were a separate ledger to review, a parallel signal to fold into the same contributing-factor count, or a third class of input we had not yet named.
The attestation events were the output of the invariant-attestation job I described in the contract drift detection post, the one we had shipped six weeks earlier. The job had been firing cleanly: twelve PR-time hash-diff blocks across the quarter, six of them resolved by version bumps inside the same PR, four of them rolled back as accidental scorer edits, two of them flagged for retrospective discussion because the version bump was contested. Eight runtime gate firings, five of them stale-cache catches, three of them surfacing actual hash drift between the manifest and the running container. One quarterly empirical refresh, which had produced a soft-drift signal on three contracts, with metric-distribution shifts that did not correspond to scorer changes. That last batch of three was what the room could not place. The shift was real, the scorer hash was clean, and the contract was still passing the invariant gate at production-PR time. It was the first signal-class I had seen that did not fit either the postmortem ledger or the existing carry-forward register, and the room visibly did not know what to do with it.
The Friday meeting ran ninety-five minutes and produced two decisions and a follow-up question. The first decision was to formalise a manifest ledger that the quarterly retrospective would treat as a peer to the postmortem ledger, not as a subset of it. The second decision was that attestation events leading to version bumps would be logged but not reviewed in the retrospective, while attestation events leading to soft drift, contested version bumps, or rollbacks would be reviewed alongside postmortems. The follow-up question was whether the carry-forward register needed a separate column for manifest-derived items, or whether folding them into the existing register would surface them at the right cadence. We left the question open for a quarter, then closed it the following retrospective by adding the column. This post is what I now wish we had walked into the Friday meeting already knowing.
The work of an attestation-aware retrospective is the work of running the existing postmortem retrospective alongside a second, parallel review of the manifest ledger, with a shared decision-stack at the end and a reconciliation pass that asks, for each candidate architecture commitment, whether the signal came from incidents, from manifest events, or from both. The pattern is mechanical once written. The discipline is in keeping the two ledgers separate enough that the retrospective can tell which kind of failure mode is producing which signal, while keeping the decision-stack at the end unified enough that the team is making one set of architecture commitments rather than two. Most teams that ship contract corpora and attestation jobs without thinking about the retrospective layer end up with a quarterly review that conflates the two ledgers into a single contributing-factor count, which produces a top-three list of items that look like incident contributors but are actually manifest events misclassified.
The Problem: Two Ledgers, One Retrospective, No Reconciliation
The quarterly retrospective format that worked for the first year of postmortems was built on a single ledger of incident artefacts. Each incident produced a postmortem document with a contributing-factor tag list, the retrospective grouped tags across the quarter's postmortems, the top three recurring tags became architecture commitments for the next quarter, and the remaining tags went into a carry-forward register that the next quarter's retrospective would re-evaluate against the new quarter's postmortems. The cadence was the part that worked. The single-ledger structure was the part that quietly broke once we shipped the attestation job.
The attestation job produced events that looked structurally similar to postmortems but answered a different question. A postmortem documents an incident that reached production, the proximate trigger, the contributing factors, and the action items the team agreed to. An attestation event documents a contract whose manifest has changed, the hash diff that triggered the change, the resolution path the PR author took, and the carry-forward signal if any. The two artefacts share a metadata schema by accident more than by design: both have a date, an actor, a description, and a closing decision. They do not share a meaning. A postmortem is an incident retrospective on something that happened in production; an attestation event is an artefact retrospective on something that happened in the corpus before it could reach production. Treating them as instances of the same class produces a retrospective that double-counts the contract corpus's quality discipline against itself.
The first time we tried to fold the attestation events into the postmortem ledger, in the quarter before the Friday meeting I described above, the contributing-factor count for eval-gap-scorer-drift came out higher than any of the actual incident contributing factors. The team treated the count as a signal that scorer drift was the quarter's biggest problem and committed an architecture investment to a more aggressive PR-time gate. The investment was real engineering time, and the gate was probably an improvement, but the underlying signal was misread: the eleven eval-gap-scorer-drift tags in the count had come from eleven attestation events, not eleven incidents, and the attestation job was already catching the drift at PR time before any of those events could become incidents. The architecture commitment had been made against the successful operation of the attestation system. The actual quarter-over-quarter incident pattern, which was a slow rise in retrieval-precision incidents on long-tail queries, did not make the top three because the count was contaminated.
The contamination is the failure mode I now warn every platform team about when they ship the attestation job. The attestation job's outputs are signal that the quality system is working. They are not signal of incident contributors. Folding them into the postmortem contributing-factor count produces a quarterly retrospective that optimises for the quality system's own throughput rather than for the next quarter's incident reduction. The fix is not to ignore the attestation events. The fix is to put them into a parallel ledger that the retrospective reviews alongside the postmortem ledger, with a clear rule for what counts as a carry-forward item from each ledger, and a reconciliation pass at the end that combines the two ledgers' signals into a single architecture commitment list.
The Pattern: Two Ledgers, One Retrospective, Three-Step Reconciliation
The attestation-aware retrospective format I now recommend has three structural changes to the postmortem retrospective format. The first is the addition of the manifest ledger as a parallel artefact corpus reviewed in the same meeting. The second is a categorical split inside the manifest ledger into reviewable events versus log-only events, with a deterministic rule for the split. The third is a reconciliation pass at the end of the meeting that combines the two ledgers' carry-forward candidates into one architecture commitment list, with a column on the carry-forward register that records which ledger surfaced the item.
The manifest ledger is, in practice, a markdown file in the same retrospectives/ directory as the postmortem index. It has the same structure as the postmortem index: a chronological list of events, each with a date, a contract name, an event type, an actor, and a one-paragraph summary. The event types are the four the attestation job produces: PR-time hash-diff block, runtime gate firing, quarterly soft-drift flag, contested version bump. The first two are usually log-only; the last two are usually reviewable. The deterministic rule we settled on was that any attestation event resolved inside the same PR or by an obvious cache invalidation was log-only, and any attestation event that produced a contested PR review or a soft-drift signal that did not correspond to a scorer change was reviewable. The rule is not perfect, and the retrospective itself will sometimes overrule it when the volume in one category is unusually high or low, but it is mechanical enough that the manifest ledger can be partly auto-categorised in the days leading up to the retrospective.
The reviewable subset of the manifest ledger is what the retrospective spends actual meeting time on. It is usually three to five events per quarter, which is a manageable number alongside the eight to twelve postmortems. Each reviewable event gets the same treatment as a postmortem during the contributing-factor pass: the room reads the summary, surfaces the underlying contributing factor, tags it, and counts the tag. The tag namespace for manifest-derived contributing factors is intentionally separate from the postmortem tag namespace; manifest tags are prefixed with attest- and postmortem tags retain their existing prefixes. The separation is what prevents the contamination I described above. A quarter with eleven attest-scorer-drift tags and three incident-retrieval-precision tags will not surface scorer drift as the top contributing factor, because the two namespaces are counted independently.
The reconciliation pass is the final fifteen minutes of the retrospective. The room takes the top contributing factors from each namespace, usually two or three per namespace, and produces a unified shortlist of candidate architecture commitments. The reconciliation rule is that an item that appears in both namespaces, even at lower rank inside each, is promoted to the top of the shortlist. The intuition behind the rule is that a problem visible from both the incident side and the artefact side is a problem with two-channel evidence, which is stronger than a problem visible from only one channel. The rule has produced the most-actionable architecture commitments of any structural change we have made to the retrospective; the pattern is that the cross-ledger items, although small in number, are usually the systemic ones that compound across quarters.
flowchart TB
A["Quarterly window<br/>Q1: 8 PMs + 21 AEs"] --> B["Postmortem ledger<br/>review"]
A --> C["Manifest ledger<br/>review"]
B --> D["Top 3<br/>incident factors"]
C --> E["Categorical split<br/>review vs log-only"]
E --> F["Top 3<br/>attest factors"]
D --> G["Reconciliation<br/>pass"]
F --> G
G --> H["Cross-ledger<br/>promotion"]
G --> I["Single shortlist<br/>commitments"]
H --> I
I --> J["Carry-forward<br/>register, two cols"]

Worked Example: A Single Attestation-Aware Retrospective
The Friday meeting produced an artefact set I have been using as the canonical worked example since. The quarter under review had eight postmortems and twenty-one attestation events. The team was a six-person LLM platform team running eleven contracts with a corpus age of four months. The retrospective ran ninety-five minutes, the longest we have ever run; the next two retrospectives ran seventy and sixty-five minutes respectively, which is roughly twenty minutes longer than a postmortem-only retrospective and which the team considers a fair price for the additional surface area.
The eight postmortems produced a contributing-factor count with three recurring incident-prefixed tags: incident-retrieval-precision-longtail (four occurrences across three postmortems), incident-tool-call-latency-tail (three occurrences across two postmortems), and incident-routing-rule-stale (three occurrences across two postmortems). The fourth-place tag was incident-baseline-cache-stale (two occurrences) and the rest were single-occurrence tags that went straight to the carry-forward register without commitment-level promotion. The top-three shortlist from the postmortem side was retrieval precision, tool-call latency, and routing-rule freshness.
The twenty-one attestation events split as expected: twelve PR-time hash-diff blocks (six log-only resolved by version bump, four log-only resolved by rollback, two reviewable due to contested version bump), eight runtime gate firings (five log-only stale-cache catches, three reviewable due to actual hash drift between manifest and running container), one quarterly empirical refresh that produced soft-drift signals on three contracts. The reviewable subset was therefore eight events: two contested version bumps, three runtime hash drifts, three soft-drift refresh signals. The contributing-factor pass on these eight produced two recurring attest-prefixed tags: attest-runtime-cache-skew (three occurrences from the runtime hash drifts) and attest-soft-drift-tone-stability (three occurrences from the empirical refresh signals on three different contracts). The other two events (the contested version bumps) produced single-occurrence tags that went to the carry-forward register.
The reconciliation pass took fifteen minutes and produced one cross-ledger promotion. incident-baseline-cache-stale from the postmortem side and attest-runtime-cache-skew from the manifest side both pointed to the same underlying contributing factor: the eval pipeline's runtime cache layer was producing stale baseline artefacts in some configurations. Neither tag was top-three on its own ledger. The cross-ledger rule promoted the conjunction to the top of the unified shortlist. The architecture commitment for the following quarter was a refactor of the runtime cache layer, which the team scoped at three engineering weeks and shipped six weeks later. The commitment would not have been surfaced by the postmortem ledger alone (it was fourth-place there), and would not have been surfaced by the manifest ledger alone (it was tied for first there, but the reconciliation pass is what made it top of the unified shortlist). The two-channel evidence is what produced the conviction to commit the engineering time.
The remaining commitments came from each ledger separately. From the postmortem side, the team committed to a retrieval-precision long-tail investigation (three engineering weeks) and a tool-call-latency-tail reduction project (two engineering weeks). From the manifest side, the team committed to a tone-stability soft-drift investigation, scoped as one engineering week of analysis with the deliverable being a recommendation for whether to tighten the tolerance on three contracts or accept the soft drift as a model-baseline shift rather than a corpus regression. The total commitment was nine engineering weeks for the quarter, against a postmortem-only retrospective baseline that historically produced six to eight commitment-weeks. The additional surface area paid for itself in the next-quarter retrospective by visibly reducing the runtime cache skew and by closing the soft-drift question.
flowchart TB
A["8 postmortems<br/>21 attestation events"] --> B["PM tags top 3"]
A --> C["AE reviewable<br/>= 8 of 21"]
B --> D["incident-retrieval-precision (4)"]
B --> E["incident-tool-call-latency (3)"]
B --> F["incident-routing-stale (3)"]
C --> G["attest-runtime-cache-skew (3)"]
C --> H["attest-soft-drift-tone (3)"]
D --> I["Standalone commit"]
E --> I
F --> I
G --> J["Cross-ledger match"]
H --> K["Standalone commit"]
J --> L["Promoted to top<br/>(matches incident-cache-stale)"]
L --> M["Architecture commit<br/>quarter Q+1"]
The Carry-Forward Register, Now With Two Columns
The carry-forward register that the attestation-aware retrospective produces is not the same shape as the register the postmortem-only retrospective produced. The most visible change is the addition of a ledger of origin column, which records whether the carry-forward item came from the postmortem ledger, the manifest ledger, or both. The column matters because the carry-forward item's freshness rules differ by ledger: a postmortem carry-forward item ages out after four quarters by the policy I described in the postmortem retrospective post; a manifest carry-forward item ages out after two quarters, because the contract corpus changes faster than the incident corpus and a stale manifest signal becomes noise faster.
The second change is a column for reconciliation rank, which records whether the item was promoted by the cross-ledger rule, surfaced by a single ledger, or carried over from a prior quarter without resurfacing this quarter. Items promoted by cross-ledger reconciliation get the highest priority in the next quarter's retrospective re-review; items surfaced by a single ledger get the standard priority; items carried over without resurfacing get the lowest priority and are candidates for retirement once they age out by their respective policy. The rank column produces a register that is partly self-prioritising, in the sense that the next quarter's retrospective can read the rank column and decide which carry-forward items to lead with rather than working through the register chronologically.
The third change is more subtle. The register grew a namespace prefix discipline that we did not have before the attestation events arrived. Postmortem-derived items kept their existing prefixes (incident- and the legacy un-prefixed tags from the early postmortems); manifest-derived items adopted the attest- prefix; cross-ledger items got a joint- prefix that made them easy to filter in the register. The discipline took two retrospectives to settle, with the second retrospective renaming a few items that had been miscategorised in the first. The discipline now produces a register that any team member can read and immediately know which kind of artefact the item came from, which has been load-bearing on retrospective re-reviews where the original presenter is not in the room.
The fourth change is the most structurally important and the one I expected to push back on for longer than I did. The register grew an aging-out asymmetry between postmortem and manifest items. Postmortem items age out after four quarters, manifest items after two; the asymmetry is intentional because the half-life of a manifest signal is shorter. A scorer-drift signal that is two quarters old has usually been resolved by a model swap, a corpus refactor, or a tolerance adjustment that the original signal never explicitly recommended. A retrieval-precision incident contributor that is two quarters old is often still live, because the incident pattern took longer to converge on. The asymmetry was the change that produced the cleanest carry-forward register I have run, and the change I now recommend any team adopting this format to ship in the same quarter as the attestation events themselves.
Comparison: Postmortem-Only vs Attestation-Aware Retrospective
The contrast worth drawing explicitly is between the original postmortem-only retrospective and the attestation-aware version. Both produce architecture commitments; both maintain a carry-forward register; both run on a quarterly cadence. The difference is the breadth of evidence and the source-of-signal column on the register. A postmortem-only retrospective surfaces architecture commitments based on the incidents that reached production. An attestation-aware retrospective surfaces commitments based on incidents and on the corpus's own quality discipline, with the cross-ledger reconciliation surfacing the items with two-channel evidence at the top of the shortlist.
The volume difference is striking once the corpus is mature. A postmortem-only retrospective at the eight-incident-per-quarter level produces about three architecture commitments per quarter and around twelve carry-forward items. An attestation-aware retrospective at the same incident volume plus eight reviewable attestation events per quarter produces about five architecture commitments per quarter and around eighteen carry-forward items. The volume increase is real engineering load, and it is the change that produced the most pushback when the team first considered the format. The pushback I now make in response is that the attestation-aware retrospective replaces the architecture commitments rather than adding to them: a team running the postmortem-only format usually ends up making the same commitments two quarters later, after the underlying signal has accumulated enough postmortem evidence on its own; a team running the attestation-aware format makes the commitment six months earlier because the cross-ledger evidence accelerates the conviction. The total commitment-weeks across a year are roughly the same; the timing is different.
The on-call experience differs the same way. A postmortem-only retrospective produces a register where most items are reactive, surfacing problems that have already produced incidents; the on-call engineer reading the register can see what went wrong but cannot easily see what is about to go wrong. An attestation-aware retrospective produces a register where the manifest-derived items, even the log-only ones in the carry-forward column, are partly anticipatory: a quarter with three contested version bumps on the same contract is signal that the contract's invariant set is getting tested, and the next quarter's on-call engineer reading the register can prepare for the contract to surface in incidents before it does. The anticipatory column does not replace the reactive column; it sits alongside it.

flowchart TB
A["Quarter complete"] --> B{"Retrospective<br/>format?"}
B -- "Postmortem-only" --> C["1 ledger<br/>incident artefacts"]
B -- "Attestation-aware" --> D["2 ledgers<br/>incidents + manifest"]
C --> E["Top 3<br/>contributing factors"]
D --> F["Top 3 from each<br/>then reconciliation"]
E --> G["3 commitments,<br/>12 carry-forwards"]
F --> H["5 commitments,<br/>18 carry-forwards"]
G --> I["Reactive<br/>(after-incident)"]
H --> J["Reactive +<br/>anticipatory"]
I --> K["Same commits,<br/>2 quarters later"]
J --> L["Same commits,<br/>6 months earlier"]
Production Considerations
The first production consideration is the discipline of categorising attestation events into reviewable versus log-only before the retrospective rather than during it. Categorising during the meeting is the failure mode I watched the Friday meeting walk into; the room spends fifteen minutes on each ambiguous event arguing about whether it warrants discussion, and the meeting runs long enough that the reconciliation pass at the end is rushed. The discipline that has worked is to have the platform engineer who runs the attestation job categorise each event as it occurs, with the deterministic rule above as the default and an open ? category for events the engineer is unsure about. The retrospective starts with a five-minute pass through the ? items to resolve them, then proceeds with the categorised ledgers as input. The five-minute resolution pass replaces the fifteen-minute mid-meeting argument and keeps the reconciliation pass un-rushed.
The second consideration is the namespace separation between postmortem and manifest contributing-factor tags. The team that adopts this format will be tempted, in the second or third retrospective, to merge the two namespaces because the cross-ledger items keep showing up under different prefixes. The temptation is the failure mode I described in the introduction: merging the namespaces produces a contributing-factor count that double-counts the corpus's quality discipline against itself, and the architecture commitments drift toward optimising for the quality system's throughput rather than for incident reduction. The discipline is to keep the namespaces separate, surface the cross-ledger items via the reconciliation pass rather than via tag merging, and accept that some commitments will be made on single-channel evidence rather than artificially boosting their priority by namespace conflation.
The third consideration is the cadence of the manifest ledger relative to the postmortem ledger. The postmortem ledger is updated when an incident's postmortem document is signed off, which is usually one to two weeks after the incident. The manifest ledger is updated when an attestation event fires, which is usually inside the same day as the underlying PR or runtime gate event. The asymmetry produces a manifest ledger that is denser in time and a postmortem ledger that is sparser; the retrospective should not try to align the two on a shared timeline, because the timeline alignment will produce a manifest ledger with apparent gaps that are actually just the days between attestation events. The cadence pattern that has worked is to treat the two ledgers as independent corpora with no enforced timeline alignment, and to use the quarterly window as the only shared boundary.
The fourth consideration is the interaction between attestation events and contract version bumps in the same retrospective window. A contract that has a version bump during the quarter will produce attestation events on either side of the bump that are categorically different: pre-bump events are against the prior contract identity; post-bump events are against the new identity. The retrospective should treat the bump as a watershed and review the pre-bump and post-bump events separately, because the contributing-factor analysis on the two sides answers different questions. The pattern that has worked is to record the version-bump date in the manifest ledger, draw a horizontal line in the retrospective's review board between pre-bump and post-bump events, and run the contributing-factor pass on each side independently. The two contributing-factor counts are then merged in the reconciliation pass with the contract-version field as a discriminator.
The fifth consideration is the budget for retrospective duration. A postmortem-only retrospective at eight incidents runs about ninety minutes. An attestation-aware retrospective at the same incident volume plus eight reviewable attestation events runs about one hundred and ten minutes after the team has settled into the format, with the first two retrospectives running closer to one hundred and twenty. The additional twenty minutes is real meeting cost across a six-person team, which is two engineering hours per quarter, which is eight engineering hours per year. The cost is dominated by the reconciliation pass, which is the most-valuable part of the meeting and which I do not recommend cutting. The cost is best framed as part of the engineering investment in the contract corpus itself rather than as overhead on the retrospective.
Conclusion
Attestation-aware retrospectives are the layer that closes the loop on the contract corpus's own integrity. A team that ships the contract pattern, the attestation job, and the postmortem retrospective without thinking about how the three artefacts interact in the quarterly review will produce a retrospective that double-counts the corpus's quality discipline against itself, surfaces architecture commitments that optimise for the quality system rather than for incident reduction, and quietly drifts away from the actual incident pattern over six to eight quarters. A team that ships the attestation-aware retrospective format alongside the attestation job will produce a quarterly review that surfaces the items with two-channel evidence at the top of the shortlist, ages out single-channel manifest signals on a tighter cadence than postmortem signals, and makes architecture commitments roughly six months earlier than the postmortem-only baseline.
The cluster I have been writing across blogs 167 through 191 has been about closing successive loops on LLM platform quality. Postmortems fix individual incidents, retrospectives fix recurring contributing factors, eval contracts fix the regression class, drift detection fixes the contract code itself, and attestation-aware retrospectives close the loop on the corpus's own integrity. Each layer closes a different loop; together they close the system. The next blog in the cluster will work through cross-team retrospective syndication, which is the format the quarterly retrospective takes once a single platform team is shipping commitments that cross into adjacent product teams' on-call rotations and the syndication pattern has to handle the cross-team carry-forward. The pattern after that will be multi-corpus retrospective rollups, which is what happens when the same retrospective format has to handle three or four contract corpora in parallel rather than one.
If you are starting from scratch with this format, the order I now recommend is: ship the postmortem retrospective first and let it stabilise for two quarters, then ship the eval contracts and the attestation job in the order I described in the prior posts, then introduce the attestation-aware retrospective in the third quarter after the attestation job has produced a stable enough event stream to categorise. Introducing the attestation-aware format before the attestation job has produced two quarters of events tends to produce a manifest ledger too sparse to support the reconciliation pass, and the reconciliation pass is the highest-value part of the format. Companion code for the manifest ledger schema, the categorisation rule, and the carry-forward register's two-column shape is in the adlc-eval-contracts directory of the amtocbot-examples repository.
Sources
- LangChain. State of Agent Engineering. April 2026. https://www.langchain.com/state-of-agent-engineering
- Datadog. State of AI Engineering Report 2026. April 2026. https://www.datadoghq.com/state-of-ai-engineering/
- Google SRE Workbook. Postmortem Culture: Learning from Failure. https://sre.google/workbook/postmortem-culture/
- Etsy Engineering. Blameless Postmortems and a Just Culture. https://www.etsy.com/codeascraft/blameless-postmortems
- Anthropic. Evaluating Frontier Models. https://www.anthropic.com/research/evaluating-models
- HumanLoop. Drift Detection in LLM Eval Pipelines. https://humanloop.com/blog/eval-drift-detection
About the Author
Toc Am
Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.
Published: 2026-05-06 · Written with AI assistance, reviewed by Toc Am.
☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter
Comments
Post a Comment