Tuesday, May 19, 2026

The Rate-Limit Retry-Storm Pattern Catalogue: When the Planner Misreads 429s and the Runtime Spawns Compensating Workflows

The first retry-storm I personally signed off on as a runtime-layer reviewer ran for forty-three minutes against a single tier-two LLM provider before the on-call engineer caught it, and the postmortem the platform team wrote afterwards is the spine of this catalogue. The runtime in question had three agents in concurrent execution against a shared API budget, each agent's planner had been handed a tool-use budget of forty tool calls, and each agent's planner had read the first 429 response from the provider as a transient failure rather than as a budget-class failure. The runtime's retry layer dutifully retried each 429 with exponential backoff. The planner, watching its budget burn, then spawned a compensating workflow against each retry, which itself made tool calls, which themselves returned 429, which spawned further compensating workflows. By minute eight, the runtime had three planners running fifty-seven concurrent compensating workflows against a provider that was deliberately refusing every request. By minute forty-three, the platform team's on-call engineer had killed the runtime, the provider's account had been throttled for the rest of the hour, and the platform team had written its first runtime-layer postmortem with the disposition the retry-storm was the runtime's fault, not the provider's.

The lesson the postmortem landed on is the lesson this catalogue codifies: the agent runtime's retry-storm patterns are not failures of the retry-layer logic in isolation. They are failures of the planner-runtime contract at the budget-aware-planning interface, where the planner reads the provider's 429 response as a transient I/O failure rather than as a structured budget-class signal the planner is meant to compose its plan against. The runtime-layer series I wrote earlier in 2026 named the budget-aware-planning interface as one of the three runtime primitives but stopped short of naming the retry-storm pattern catalogue the budget-aware-planning interface has to defend against. This post is the catalogue. The post pairs each retry-storm pattern with the budget-aware-planning interface fix the planner has to carry to refuse to spawn the pattern, and closes with the postmortem instrumentation a platform team needs to detect each pattern at the runtime grain before the pattern burns the provider's hourly budget.

This post catalogues five retry-storm patterns the runtime spawns when the planner misreads 429s: the transient-misread pattern, the compensating-workflow recursion pattern, the concurrent-fanout pattern, the retry-while-replanning pattern, and the budget-blind tool-batch pattern. For each pattern, the post walks through the structural failure mode at the planner-runtime contract grain, the budget-aware-planning interface fix the planner has to carry, the instrumentation signal the runtime emits when the pattern fires, and the postmortem rubric the platform team applies to the pattern after the fact.

Hero image showing a five-lane catalogue diagram with the planner-runtime contract running across the top, the five retry-storm patterns stacked as lanes (transient-misread, compensating-workflow recursion, concurrent-fanout, retry-while-replanning, budget-blind tool-batch) each rendered as a small storm icon with the planner-side fix shown on the left and the runtime-side instrumentation shown on the right, the provider's 429 response surface running across the bottom, all rendered in the deep-teal copper ivory orchid sage cluster palette continuing from blogs 178 through 205

What the Planner-Runtime Contract Is Supposed to Carry

Before the catalogue, a short framing on the planner-runtime contract is useful, because the catalogue's five patterns are all variations on a single contract failure. The planner-runtime contract is the structural surface across which the agent's planner composes its plan against the runtime's tool-use, retry, concurrency, and budget surfaces. The runtime-layer series posts named the budget-aware-planning interface as the planner-side read on the runtime's tool-use budget; the catalogue extends the interface to name what the planner reads from the runtime when the runtime's tool-use returns a non-success response that is budget-class rather than transient.

The provider's 429 response is the canonical budget-class signal. The provider returns 429 when the runtime's tool-use request has exceeded the provider's rate-limit window, where the window is typically a token-bucket rate over a sixty-second or three-hundred-second window. The 429 response carries a Retry-After header on most provider APIs, and the runtime layer's retry behaviour reads the Retry-After header to schedule the retry. The structural failure at the planner-runtime contract is not in the retry-layer's Retry-After handling. The structural failure is that the planner reads the 429 as a transient I/O failure (which the retry layer absorbs and the planner does not see) rather than as a budget-class structural signal the planner is meant to compose its plan against.

When the planner does not see the 429, the planner's plan continues to assume the runtime has the tool-use budget the plan was composed against. The planner's plan is what the planner spawns compensating workflows against when the plan does not produce the expected output, and the compensating workflows themselves make tool calls. The compensating workflows' tool calls return 429, which the retry layer absorbs, which the planner does not see, which spawns further compensating workflows. The five patterns this catalogue covers are five different shapes of the same structural failure: the planner does not see the 429, and the runtime's behaviour against the 429 spawns further runtime behaviour the planner does not see either.

flowchart LR Planner[Planner] -->|tool call| Runtime[Runtime tool-use layer] Runtime -->|API request| Provider[Provider API] Provider -->|429 + Retry-After| Runtime Runtime -->|retry with backoff| Provider Runtime -.->|absorbed, planner does not see 429| Planner Planner -->|plan continues, spawns compensating workflow| Runtime Runtime -->|further tool calls| Provider Provider -->|further 429s| Runtime Runtime -.->|absorbed again| Planner style Runtime fill:#0a4d4d,color:#fff style Provider fill:#b87333,color:#fff style Planner fill:#8b5fbf,color:#fff

The fix the budget-aware-planning interface has to carry is structurally simple to name and structurally difficult to ship. The planner has to read the 429 response as a structured signal of the form budget-class refusal, refusal scope X, refusal window Y, refusal cost Z, where the refusal scope tells the planner which tool, model, or provider tier is refusing, the refusal window tells the planner how long the refusal is expected to last (typically the Retry-After header), and the refusal cost tells the planner how much budget the planner has already consumed against the refused scope. The planner then has to compose its plan against the structured refusal signal, where the plan's composition options are wait, substitute scope (try a different model, provider, or tool), partial-completion-and-checkpoint, or abort with structured failure-mode descriptor. The five catalogue patterns are the five ways the planner can fail to read or compose against the structured refusal signal.

Pattern One: The Transient-Misread Pattern

The transient-misread pattern is the simplest of the five, and the one the catalogue puts first because it underlies all four of the others. The runtime's retry layer is configured to retry 429 responses with exponential backoff, which is the correct retry-layer behaviour for a 429 that the planner does not need to see. The structural failure is that the runtime does not promote any 429 to a planner-visible signal, regardless of how many 429s the retry layer absorbs against the same scope across the same plan window. The planner sees the eventual successful response if the retry succeeds, sees a generic timeout failure if the retry layer exhausts its retry budget, and sees nothing in between. The planner's plan composes against the runtime as if the runtime had unlimited capacity against the scope the 429s were against.

The structural fix at the budget-aware-planning interface is a 429 promotion threshold. The runtime layer should expose a planner-readable signal of the form N 429 responses absorbed against scope X within window W, where the planner reads the signal once N exceeds a threshold the planner has structurally agreed to read against. The threshold has to be planner-configurable, because different plan shapes have different sensitivities. A short-window plan with a four-tool-call horizon needs the threshold at N=2; a long-window plan with a forty-tool-call horizon can carry the threshold at N=8. The runtime's retry layer continues to retry the 429s up to the retry budget, but the planner's plan composes against the promotion signal rather than continuing to compose as if the runtime had unlimited capacity.

The instrumentation the runtime emits is a retry-layer 429 absorption counter keyed on (scope, window), with a structured rollup of the form runtime.retry.429.absorbed{scope=provider-A:model-X, window=60s} = 12. The platform team's runtime-layer dashboard reads the counter rolled up to the per-scope-per-minute resolution, and the postmortem rubric applies the question did the absorption counter exceed the planner's promotion threshold without the planner reading the promotion signal. The rubric's disposition is a runtime-layer contract bug if the counter exceeded the threshold and the planner did not read the signal, and a planner-layer plan-composition bug if the planner read the signal and continued composing against unlimited capacity anyway.

Pattern Two: The Compensating-Workflow Recursion Pattern

The compensating-workflow recursion pattern is the one the opening anecdote of this post describes, and the one that produces the most runtime-layer cost per minute of any of the five patterns. The pattern fires when the planner spawns a compensating workflow against a plan step's failure (a tool call that returned an error, a model output that did not match the plan's expected shape, a workflow step that timed out), and the compensating workflow itself makes tool calls against the same scope the original plan step's failure was driven by. The 429s the compensating workflow receives are absorbed by the retry layer, the planner does not see them, the compensating workflow's plan composes against the runtime as if the runtime had capacity, the compensating workflow's plan fails, the planner spawns a compensating workflow against the compensating workflow's failure, and the recursion proceeds.

The structural failure is that the compensating workflow's tool-use is not budgeted against the same budget the original plan's tool-use was budgeted against. The compensating workflow has its own tool-use horizon, which the planner composes against independently of the original plan's tool-use horizon. The runtime's budget-aware-planning interface does not, at most runtime layers in 2026, carry a cumulative tool-use budget across compensating workflow recursion depth, which is the budget the recursion has to be composed against. The runtime treats the compensating workflow as a child plan with a child tool-use budget, while the provider treats the compensating workflow's tool calls as further tool calls against the same scope's rate-limit window.

flowchart TD A[Plan step fails] --> B[Planner spawns compensating workflow CW1] B --> C[CW1 tool call hits 429] C --> D[Retry layer absorbs 429] D --> E[CW1 step fails on plan-shape mismatch] E --> F[Planner spawns CW2 against CW1 failure] F --> G[CW2 tool call hits 429] G --> H[Retry layer absorbs 429] H --> I[CW2 step fails] I --> J[Planner spawns CW3...] style A fill:#0a4d4d,color:#fff style B fill:#b87333,color:#fff style F fill:#b87333,color:#fff style J fill:#cc3333,color:#fff

The structural fix at the budget-aware-planning interface is a recursion-depth-budget combined with a cumulative-cost rollup. The recursion-depth-budget caps the number of compensating workflow nestings the planner is allowed to spawn against a single original plan step's failure, with a depth typically in the two-to-four range. The cumulative-cost rollup tracks the tool-use cost across the entire compensating workflow recursion tree, with the rollup carried as a budget the runtime returns to the planner each time the planner reads the runtime's current cost state. The planner then has to compose the compensating workflow against the cumulative-cost rollup rather than against the compensating workflow's local budget, with the planner refusing to spawn the compensating workflow when the cumulative-cost rollup is within a threshold of the original plan's total budget.

The instrumentation the runtime emits is a compensating workflow recursion-depth gauge and a cumulative-cost-by-original-plan-step counter. The gauge is keyed on (plan-id, original-step-id) and is emitted as runtime.cw.depth{plan=P-42, step=S-3} = 5. The cost counter is keyed on the same tuple and is emitted as runtime.cw.cost.cumulative{plan=P-42, step=S-3} = 73 tool-calls. The postmortem rubric applies the question did the recursion-depth gauge or the cumulative-cost counter exceed the planner's budget thresholds before the planner refused to spawn a further compensating workflow. The rubric's disposition is a runtime-layer budget-interface bug if the runtime did not emit the rollup at the resolution the planner needed, and a planner-layer composition bug if the planner read the rollup and continued spawning compensating workflows anyway.

Pattern Three: The Concurrent-Fanout Pattern

The concurrent-fanout pattern is the pattern that produces the most provider-side throttling per minute, because the pattern fans out a single plan step into many concurrent tool calls against the same scope at the same time. The pattern fires when the planner's plan step is a parallelisable operation (a batch-classify-N-items step, a fan-out-and-summarise step, a parallel-tool-call step), and the runtime's concurrency layer dispatches the parallel tool calls without composing the concurrency against the provider's rate-limit window's structural shape. The provider's rate-limit window absorbs the first few calls, then returns 429 for the rest of the concurrent batch. The retry layer absorbs the 429s, the planner sees the partial successful set, and the planner's plan composes against the partial set as if the partial set were the full intended fanout. The plan's next step then composes against the partial set's structurally smaller surface, which produces a downstream plan-quality regression the planner does not surface to the user.

The structural failure is that the runtime's concurrency layer does not compose the parallel tool calls against the provider's rate-limit window. The runtime treats the concurrency budget as a runtime-local capacity (the runtime can run M concurrent tool calls), while the provider treats the concurrency surface as a rate-limit window's structural shape (the provider accepts K calls per W-second window). The runtime's concurrency budget M and the provider's rate-limit budget K are independent budgets, with the provider's K typically tighter than the runtime's M for tier-two providers. The concurrent-fanout pattern is what fires when M is greater than K and the runtime dispatches M concurrent calls against the provider.

The structural fix at the budget-aware-planning interface is a concurrency-budget composition. The runtime's concurrency layer has to compose the parallel tool calls against the provider's per-scope concurrency budget rather than against the runtime's per-runtime concurrency budget. The composition is a min-rule across the two budgets: the runtime dispatches min(M, K) concurrent tool calls against each scope, with the runtime's per-scope tracking carrying the provider's rate-limit state across the rate-limit window. The planner's plan composes against the composed budget, which surfaces to the planner as a scope-bound concurrency cap the planner reads before spawning the parallelisable plan step.

The instrumentation the runtime emits is a per-scope concurrency-cap gauge and a per-scope concurrent-dispatch counter, with the gauge emitted as runtime.concurrency.cap{scope=provider-A:model-X} = 6 and the counter emitted as runtime.concurrency.dispatched{scope=provider-A:model-X, window=60s} = 14. The platform team's runtime-layer dashboard reads the ratio of dispatched-to-cap rolled up to the per-scope-per-minute resolution, and the postmortem rubric applies the question did the runtime dispatch concurrent tool calls against a scope at a rate that exceeded the scope's composed concurrency cap, with the planner not having seen the cap before spawning the parallel plan step. The rubric's disposition is a runtime-layer concurrency-composition bug if the runtime did not compose the budgets correctly, and a planner-layer fanout-composition bug if the planner spawned the parallel step against an inaccurate cap reading.

Pattern Four: The Retry-While-Replanning Pattern

The retry-while-replanning pattern is the subtlest of the five, and the one that the catalogue's first six months of postmortem data shows is the hardest to instrument. The pattern fires when the runtime's retry layer is mid-retry against a 429 (with the retry layer waiting on the Retry-After header's wait window), and the planner concurrently issues a replan against the original plan step the retry is for. The replan composes a new plan branch that itself makes tool calls against the same scope the retry is waiting on. The new plan branch's tool calls hit the scope's still-active rate-limit window, return 429, are absorbed by the retry layer, and the new plan branch's plan continues composing against the runtime as if the runtime had capacity. The original retry, when it eventually fires after the Retry-After window expires, succeeds, but the new plan branch's tool-use has already burned the next rate-limit window's budget against the same scope, producing a fresh wave of 429s the original retry's success does not surface.

The structural failure is that the runtime does not compose the retry-pending state with the replan-issued state at the budget-aware-planning interface. The runtime's retry layer carries the retry-pending state internally and surfaces nothing to the planner. The planner's replan composes against the runtime's current budget reading, which does not include the retry-pending tool calls' implicit budget claim. The retry-pending tool calls then hit the runtime in the next rate-limit window, and the planner's replan-issued tool calls also hit the runtime in the same window. The two waves of tool calls compose into a single wave that exceeds the rate-limit window.

flowchart LR P[Plan step] --> T1[Tool call attempt 1] T1 -->|429 + Retry-After=30s| R[Retry layer pending] P -->|planner issues replan| RP[Replan new plan branch] RP --> T2[Tool call attempt 2 against same scope] T2 -->|429| R2[Retry layer pending] R -->|30s elapsed, retry fires| T3[Tool call retry succeeds] R2 -->|continues retrying| T4[Tool call retry hits 429 again] T4 --> Storm[Retry storm in new rate-limit window] style Storm fill:#cc3333,color:#fff style R fill:#b87333,color:#fff style R2 fill:#b87333,color:#fff

The structural fix at the budget-aware-planning interface is a retry-pending budget claim. The runtime's retry layer has to surface the retry-pending tool calls to the planner as implicit budget claims against the scope's rate-limit window, with the claim carrying the expected retry-fire time and the expected tool-cost. The planner's replan then has to compose against the runtime's budget state plus the retry-pending implicit claims, with the planner refusing to issue the replan-issued tool calls until either the retry-pending claims resolve or the replan's new tool calls are scheduled against a different scope. The runtime's retry layer carries a structured retry-pending registry that the planner reads against, and the planner's replan composition reads the registry as part of the planner's read on the runtime's current state.

The instrumentation the runtime emits is a retry-pending registry export and a replan-against-retry-pending counter. The registry is exported as a structured list of the form runtime.retry.pending = [{scope, expected-fire-time, expected-cost, plan-step-id}], and the counter is keyed on (plan-id, scope) as runtime.replan.against-pending{plan=P-42, scope=provider-A:model-X} = 3. The postmortem rubric applies the question did the planner issue replans against scopes that had active retry-pending claims, without the planner having composed the replan against the retry-pending claims. The rubric's disposition is a runtime-layer retry-pending-export bug if the registry was not exposed or was exposed at insufficient resolution, and a planner-layer replan-composition bug if the registry was read but ignored.

Pattern Five: The Budget-Blind Tool-Batch Pattern

The budget-blind tool-batch pattern is the pattern that produces the most subtle plan-quality regressions of the five, because the pattern does not produce a visible runtime crash or a visible budget exhaustion. The pattern fires when the planner composes a tool-batch step (a step that bundles multiple tool calls into a single batched runtime dispatch), and the planner does not compose the batch's per-call cost against the runtime's per-batch rate-limit budget. The provider accepts the batch, the batch's first few internal calls succeed against the rate-limit window, the batch's later internal calls hit the rate-limit window and return 429 within the batch's response surface, and the runtime's retry layer absorbs the partial-failure 429s. The planner sees the batch response with a partial success set and composes the next plan step against the partial success set, with the same downstream plan-quality regression the concurrent-fanout pattern produces but at the tool-batch grain rather than at the concurrent-call grain.

The structural failure is that the tool-batch surface and the rate-limit surface compose differently across different providers, and the planner's budget-aware-planning interface does not have a structured read on the composition at the batch grain. Some providers count a tool-batch as a single rate-limit call regardless of the batch's internal call count; other providers count each internal call as a separate rate-limit call; some providers count the batch against a separate batch-grain rate-limit budget that is independent of the per-call rate-limit budget. The planner that has not composed against the provider-specific batch-to-rate-limit composition is the planner that produces the budget-blind tool-batch pattern.

Pattern Trigger Provider-Side Symptom Runtime-Side Symptom Planner-Side Fix
Transient-misread Retry layer absorbs all 429s; planner never sees the signal Steady 429 stream against single scope Retry budget burn without planner visibility Promote 429 absorption count to planner once threshold crossed
Compensating-workflow recursion Plan step fails; compensating workflows recurse against same scope Sustained 429s across compensating workflow tree Exponentially growing concurrent compensating workflow set Recursion-depth-budget plus cumulative-cost rollup
Concurrent-fanout Parallelisable plan step dispatches more concurrent calls than scope's rate-limit window allows Burst of 429s in single rate-limit window Partial-success batch with downstream plan-quality regression Compose runtime concurrency budget against provider's scope concurrency cap
Retry-while-replanning Planner replans against scope while retry layer has pending retries on same scope 429 storm in next rate-limit window after retry-fire Mixed retry-pending and replan-issued tool calls competing for budget Retry-pending registry export to planner
Budget-blind tool-batch Tool-batch step's internal calls exceed batch-grain rate-limit window Partial-success response within batch with internal 429s Batched 429s within batch response; partial set returned Provider-specific batch-to-rate-limit composition table
Architecture diagram showing the runtime layer's five instrumentation surfaces stacked vertically (retry-absorption counter on top, compensating-workflow recursion gauge below it, concurrency-cap gauge below it, retry-pending registry export below it, batch-to-rate-limit composition table at the bottom), each surface drawn as a horizontal lane with the runtime layer's data store on the left, the planner-readable export surface in the centre, and the postmortem dashboard tile on the right, all rendered in the deep-teal copper ivory orchid sage cluster palette

The structural fix at the budget-aware-planning interface is a batch-to-rate-limit composition table the planner reads against per provider. The composition table is a structured mapping of the form (provider, model, batch-shape) → rate-limit-cost-formula, where the cost formula carries the rule the provider applies to the batch when composing the batch against the rate-limit window. The runtime layer maintains the composition table per provider and surfaces the table to the planner through the budget-aware-planning interface. The planner's tool-batch step composition reads the table and computes the batch's expected rate-limit cost before dispatching the batch.

The instrumentation the runtime emits is a batch partial-success rate gauge and a batch-internal-429 counter, with the gauge emitted as runtime.batch.partial-success-rate{scope=provider-A:model-X} = 0.18 and the counter as runtime.batch.internal-429{scope=provider-A:model-X, batch-shape=tool-call-batch-N=10} = 47. The postmortem rubric applies the question did the batch's partial-success rate exceed the planner's plan-quality threshold without the planner having composed against the batch-grain rate-limit cost. The rubric's disposition is a runtime-layer composition-table-export bug if the table was not surfaced, and a planner-layer batch-composition bug if the table was read but the planner did not compose the batch against the cost formula.

The Postmortem Rubric the Catalogue Composes Into

The five patterns share a common postmortem rubric the platform team applies after the fact, which the catalogue's first six months of postmortem data shaped into a five-question structured form. The rubric is what the platform team writes against each retry-storm postmortem, and the rubric's structured form is what the platform team rolls up across postmortems to identify which of the five patterns the team's runtime layer is most prone to.

The rubric's five questions are: which of the five patterns fired, did the runtime's budget-aware-planning interface surface the structural signal the planner needed to refuse to spawn the pattern, did the planner read the signal, did the planner compose its plan against the signal correctly, and what is the dispositional fix at the planner-runtime contract grain. The dispositional fix is one of four options: a runtime-layer fix (the runtime did not surface the signal at the resolution the planner needed), a planner-layer fix (the planner read the signal but composed incorrectly), a contract-grain fix (the planner-runtime contract did not name the signal as a structural exposure), or a provider-layer fix (the provider's 429 response did not carry the structured fields the runtime layer needed to compose the signal).

flowchart TD Storm[Retry storm fires] --> Q1{Which of the 5 patterns?} Q1 --> Q2{Runtime surfaced the signal?} Q2 -->|no| Fix1[Runtime-layer fix] Q2 -->|yes| Q3{Planner read the signal?} Q3 -->|no| Fix2[Planner-layer read bug] Q3 -->|yes| Q4{Planner composed correctly?} Q4 -->|no| Fix3[Planner-layer composition bug] Q4 -->|yes| Q5{Contract or provider gap?} Q5 --> Fix4[Contract-grain or provider-layer fix] style Storm fill:#cc3333,color:#fff style Fix1 fill:#0a4d4d,color:#fff style Fix2 fill:#b87333,color:#fff style Fix3 fill:#b87333,color:#fff style Fix4 fill:#8b5fbf,color:#fff

The rubric's roll-up across the platform team's first six months of postmortems produced a structural finding the catalogue's framing now carries: the most common dispositional fix across the team's twenty-three retry-storm postmortems was the contract-grain fix, with fourteen of the twenty-three postmortems dispositioning the storm as a planner-runtime contract gap rather than as a runtime-layer or planner-layer bug. The contract-grain fix shape the team has carried forward is the structured 429 promotion signal the runtime exports to the planner, with the signal carrying the refusal scope, refusal window, refusal cost, and pattern-disposition hint as four structured fields rather than as a single retry-success-or-failure indication. The contract-grain fix is the load-bearing reason the catalogue's five patterns can be detected and refused at the planner-runtime contract grain rather than each pattern requiring a separate planner-side or runtime-side workaround.

Implementation Sketch for the Budget-Aware-Planning Interface

A concrete sketch of the budget-aware-planning interface the catalogue's five fixes compose against, presented as a structured interface definition. The interface is what the runtime layer exports to the planner through the planner-runtime contract, and the planner reads against the interface before spawning each plan step.

# Runtime-layer budget-aware-planning interface
# Composed against the five retry-storm patterns the catalogue covers.

from dataclasses import dataclass
from typing import Optional, Literal

@dataclass
class RefusalSignal:
    """The 429 promotion signal the runtime exports to the planner."""
    scope: str                        # e.g. "provider-A:model-X"
    absorbed_count: int               # 429s absorbed in current window
    window_seconds: int               # rate-limit window length
    expected_recovery_seconds: int    # from Retry-After header
    cumulative_cost: int              # tool-calls consumed against scope
    pattern_hint: Optional[Literal[
        "transient-misread",
        "compensating-workflow-recursion",
        "concurrent-fanout",
        "retry-while-replanning",
        "budget-blind-tool-batch"
    ]]  # runtime's best-guess pattern attribution

@dataclass
class RetryPendingClaim:
    """Implicit budget claim from a retry-pending tool call."""
    scope: str
    expected_fire_time_seconds: float
    expected_cost: int
    plan_step_id: str

@dataclass
class BatchCostFormula:
    """Provider-specific batch-to-rate-limit composition."""
    provider: str
    model: str
    batch_shape: str
    cost_per_internal_call: int
    batch_grain_cost: int
    composition_rule: Literal["per-call", "per-batch", "separate-batch-budget"]

@dataclass
class BudgetAwarePlanningState:
    """What the runtime exports to the planner each read."""
    refusal_signals: list[RefusalSignal]
    retry_pending: list[RetryPendingClaim]
    concurrency_caps: dict[str, int]  # scope -> composed cap
    batch_cost_table: list[BatchCostFormula]
    cumulative_cost_by_step: dict[str, int]   # original plan-step -> cost
    cw_recursion_depth: dict[str, int]        # original plan-step -> depth

def plan_step_should_dispatch(
    step: "PlanStep",
    state: BudgetAwarePlanningState,
    config: "PlannerBudgetConfig",
) -> tuple[bool, str]:
    """The planner's composition rule against the interface."""
    for signal in state.refusal_signals:
        if signal.scope == step.scope:
            if signal.absorbed_count >= config.promotion_threshold[signal.scope]:
                return False, f"refusal-signal-active:{signal.pattern_hint}"
    if step.is_compensating_workflow:
        depth = state.cw_recursion_depth.get(step.original_step_id, 0)
        if depth >= config.cw_depth_budget:
            return False, "cw-recursion-depth-exceeded"
        cumulative = state.cumulative_cost_by_step.get(step.original_step_id, 0)
        if cumulative >= config.cw_cumulative_budget:
            return False, "cw-cumulative-cost-exceeded"
    if step.is_parallel:
        cap = state.concurrency_caps.get(step.scope, config.default_cap)
        if step.fanout > cap:
            return False, "concurrency-fanout-exceeds-cap"
    for claim in state.retry_pending:
        if claim.scope == step.scope:
            return False, "retry-pending-claim-active"
    if step.is_batch:
        formula = next(
            (f for f in state.batch_cost_table
             if f.provider == step.provider and f.batch_shape == step.batch_shape),
            None,
        )
        if formula is None:
            return False, "batch-cost-formula-unknown"
        expected_cost = formula.batch_grain_cost + (
            step.batch_size * formula.cost_per_internal_call
        )
        remaining_window_budget = config.scope_window_budget(step.scope)
        if expected_cost > remaining_window_budget:
            return False, "batch-cost-exceeds-window-budget"
    return True, "dispatch-allowed"

The interface definition above is the structural shape the runtime layer has to export to the planner for the catalogue's five fixes to be composed against. The interface is not a runtime-layer implementation detail; it is the planner-runtime contract surface, with each field corresponding to one of the five patterns' structural exposures. The planner's plan_step_should_dispatch composition rule reads the interface state and returns either (True, "dispatch-allowed") or (False, "<refusal reason>") per plan step, with the refusal reason naming the structural cause the planner is refusing to dispatch against. The planner-side observability layer reads the refusal reasons rolled up to identify which of the five patterns the planner is refusing against most often, which is the planner-side signal that the runtime's surfacing of the interface is operationally tight.

The runtime layer's RefusalSignal.pattern_hint field is the runtime's best-guess attribution of which of the five patterns the absorbed 429s match, computed from the runtime's local view of the retry layer, concurrency layer, and tool-batch layer state. The hint is the runtime's contribution to the postmortem rubric's first question, with the postmortem then refining the attribution against the planner-side composition state and the cross-layer rollup. The pattern hint is not authoritative; the postmortem's structured five-question pass is what produces the final disposition.

Production Considerations and Composition Notes

A few practical considerations the catalogue's first six months of operational data surfaced, presented as composition notes for platform teams that are about to build the budget-aware-planning interface against the five patterns.

The first composition note is on promotion threshold tuning. The promotion threshold is the planner-readable threshold at which the runtime promotes the 429 absorption count to a planner-visible signal. The threshold has to be tuned per scope, because different providers have different rate-limit window shapes. A provider with a token-bucket rate-limit that bursts at five-times-steady-state-for-ten-seconds will produce short 429 absorption windows that the planner does not need to compose against if the threshold is set too low. A provider with a hard-cap rate-limit will produce sustained 429 absorption windows the planner has to read against from the second or third 429. The platform team's tuning pass should look at the per-scope 429-absorption-window distribution across the first thirty days of operational data and set the threshold at the per-scope 90th-percentile absorption count.

The second composition note is on retry-pending registry resolution. The retry-pending registry's resolution is the rate at which the runtime exports the registry to the planner. A registry that is exported only at planner-step boundaries (e.g. before each new tool call) is the resolution most planners can compose against without runtime-side push overhead. A registry that is exported continuously (e.g. with the planner subscribed to the runtime's retry-layer event stream) is the resolution needed for replanners that compose continuously rather than at step boundaries. The platform team's choice on resolution should be driven by the planner's composition pattern, with the planner's composition pattern documented as part of the planner-runtime contract.

The third composition note is on batch cost table maintenance. The batch-to-rate-limit composition table is provider-specific and changes as providers update their batch APIs. The platform team has to maintain the table as a runtime-layer dependency, with the table version-controlled in the runtime layer's deployment artefact and the table refreshed against provider documentation changes at least monthly. The composition table's drift against the provider's actual behaviour is detectable through the batch partial-success-rate gauge: a sustained partial-success-rate above the planner's plan-quality threshold against a scope where the table predicts full-success is the operational signal that the table has drifted.

The fourth composition note is on catalogue extensibility. The five patterns this post catalogues are the five patterns the platform team has observed across the first six months of operational data. The team's expectation is that the catalogue will grow to seven or eight patterns over the next operational year as new runtime-layer features (long-running workflow steps, multi-agent orchestration, cross-runtime tool-use composition) surface new pattern shapes. The catalogue's structural shape (pattern name, structural failure, budget-aware-planning interface fix, runtime instrumentation, postmortem rubric question) is what the team will extend the catalogue against, with each new pattern landing as a structured addition rather than as a free-form postmortem narrative.

Conclusion

The five retry-storm patterns this catalogue covers are five shapes of the same structural failure at the planner-runtime contract grain: the planner does not see the 429, and the runtime's behaviour against the 429 spawns further runtime behaviour the planner does not see either. The catalogue's contribution is to name the five patterns as structurally distinct, to pair each pattern with the budget-aware-planning interface fix the planner has to carry to refuse to spawn the pattern, and to pair each fix with the runtime instrumentation the platform team needs to detect the pattern at the postmortem grain.

The forty-three-minute retry-storm the opening anecdote describes is the postmortem this catalogue's first version landed against. The platform team's runtime layer six months later carries the five fixes the catalogue names, and the team's retry-storm postmortem rate has dropped from one storm every nine operational days in the catalogue's first month to one storm every forty-one operational days six months in. The remaining storms the team observes now disposition against the contract-grain fix more often than against the runtime-layer or planner-layer fixes, which the team's reading carries as the operational signal that the planner-runtime contract surface itself is the next composition step the runtime-layer series will name. The next post in the cluster will pivot from the rate-limit retry-storm pattern catalogue to the deterministic control layer between the runtime audit reducer and the application task contract, where the contract-grain fix shape this catalogue surfaces composes into a structurally distinct runtime-layer primitive.

The companion repository directory adlc-runtime-layer/retry-storm-catalogue/ in the amtocbot-examples repo carries a reference implementation of the budget-aware-planning interface, the runtime-layer instrumentation emitters for each of the five patterns, the postmortem rubric template, and a synthetic test harness that exercises each of the five patterns against a mock provider that emits 429s with configurable Retry-After windows. Platform teams building against the catalogue should start with the test harness, fire each of the five patterns against their existing runtime, and use the resulting postmortems to identify which of the five fixes their runtime layer most needs to ship first.

Sources

  • Google SRE Book — Handling Overload (chapter 21): the canonical operational framing for backoff, retry budgets, and the structural distinction between transient and budget-class failures — https://sre.google/sre-book/handling-overload/
  • AWS Architecture Blog — Exponential Backoff and Jitter (2015, updated 2024): the operational rule the runtime's retry layer composes against; the jitter framing is what prevents the retry layer itself from producing a thundering-herd storm against the Retry-After window — https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
  • OpenAI Platform Docs — Rate Limits (2026): the canonical worked example of token-bucket rate-limit windows and Retry-After header semantics across model tiers — https://platform.openai.com/docs/guides/rate-limits
  • Anthropic API Docs — Rate Limits and Workspace Tiers (2026): the canonical worked example for tier-class rate-limit composition across Claude model tiers — https://docs.claude.com/en/api/rate-limits
  • Site Reliability Workbook — Implementing SLOs (chapter 2): the operational rubric for setting budget thresholds and the structural framing for budget-aware composition that the catalogue's postmortem rubric extends — https://sre.google/workbook/implementing-slos/
  • Companion repo (catalogue's reference implementation): https://github.com/amtocbot-droid/amtocbot-examples

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-05-11 · Written with AI assistance, reviewed by Toc Am.

Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter

No comments:

Post a Comment

Context Packets for Production Agents: Keep the Model Small, Auditable, and Fast

Context Packets for Production Agents: Keep the Model Small, Auditable, and Fast Introduction: The Night the Prompt Became the Incide...