Distributed Caching in 2026: Cache Invalidation, CDN Strategy, and Building a Cache That Doesn't Lie

Distributed Caching in 2026: Cache Invalidation, CDN Strategy, and Building a Cache That Doesn't Lie

Hero image

Introduction

Phil Karlton's famous quip — "There are only two hard problems in computer science: cache invalidation and naming things" — gets repeated at conferences, on t-shirts, and in job interviews. What rarely gets discussed is why cache invalidation is hard. The concept is not complex. You write new data, you remove or update the old cached version. That sentence takes three seconds to understand. So why does stale data cause production incidents at companies with hundreds of engineers?

The answer is timing. Cache invalidation is hard because the failures are invisible until the conditions align: a race between a write and a read, a cache miss storm that collapses your database under 40x normal load, a CDN serving a deleted product page to ten thousand users because no one called the purge API. These failure modes do not appear in development. They appear at 2 AM under load, when the cache hits 94% on most paths but a newly deployed schema breaks the other 6% in a way that corrupts user-visible data.

The cost calculation is asymmetric. A cache miss costs you latency — a round-trip to the database that might add 10-50ms to a response. A stale cache hit costs you correctness — a user sees a price that was updated six minutes ago, a balance that does not reflect their last transaction, a permission state that no longer applies. Latency is measurable and recoverable. Stale data erodes trust in ways that are harder to quantify.

In 2026, distributed caching has gotten more complex, not simpler. Multi-region deployments mean a write in us-east-1 has to invalidate cached data at CDN edge nodes in Frankfurt, Singapore, and São Paulo simultaneously. Serverless and edge compute mean your "in-process" cache has a lifetime of milliseconds. Read replicas, CQRS patterns, and event-sourced architectures introduce propagation delays between the write path and the read path that your cache has to account for.

This post covers the patterns that actually work at production scale: cache invalidation strategies with race-condition proofs, key design that prevents thundering herds, layered architectures that keep hit rates above 90%, CDN configuration that survives a content publish, and monitoring that tells you when your cache starts lying before your users notice.


1. Cache Invalidation Strategies

Six core patterns cover nearly every cache invalidation use case. The right choice depends on your consistency requirements, write volume, and whether your application can tolerate brief windows of stale data.

TTL-based expiry is the simplest approach: every cache entry has a time-to-live, after which it expires. No coordination required. The tradeoff is eventual consistency with a bounded staleness window. If your TTL is 60 seconds, you accept that reads in that window may return data up to 60 seconds old. This is the right default for content that changes slowly and where brief staleness is acceptable — product catalog data, user profile summaries, feature flag configs. It breaks down when writes are frequent or when correctness is load-bearing (financial balances, inventory counts, permissions).

Event-driven invalidation publishes an invalidation event on every write. A subscriber receives the event and deletes the cache key. This achieves near-real-time consistency without the write-through penalty, but it introduces a coordination dependency: if the invalidation subscriber is down or lagging, your cache serves stale data indefinitely. Redis keyspace notifications or a dedicated event bus (Kafka, Redis Streams) are common implementations.

Write-through writes to both the cache and the database in a single operation before returning to the caller. No stale reads are possible because the cache is always updated on write. The cost is higher write latency — every write pays the round-trip to both systems. This pattern makes sense when read performance is critical, write volume is moderate, and you cannot tolerate stale reads under any condition.

Write-behind (write-back) writes to the cache first and flushes to the database asynchronously. Write latency drops to a single cache round-trip, but you accept data loss on crash: if the cache node dies before the async flush completes, those writes are gone. This is the right pattern for high-throughput counters, rate limiters, and analytics events where some loss is acceptable. Never use it for financial transactions or any data where durability is required.

Cache-aside (lazy loading) is the most common pattern: on a cache miss, load from the database and populate the cache. Simple to implement, but it does not invalidate on write — you rely on TTL or explicit deletion to clear stale entries.

The double-delete pattern is what you reach for when event-driven invalidation and cache-aside combine and race conditions become a real risk. Without double-delete, a concurrent reader can populate the cache with stale data after your invalidation event fires. The sequence:

  1. Delete the cache key (first delete, before the write)
  2. Write to the database
  3. Delete the cache key again (second delete, after the write)

The first delete ensures any reader that is currently in-flight with stale data does not repopulate the cache after your write. The second delete clears any entry that a concurrent reader populated between the first delete and the database write completing. There is still a narrow window of staleness, but it is bounded to the time between the second delete and the next cache population — not indefinite.

Tag-based invalidation assigns cache entries to logical groups. When a product is updated, a single invalidation event clears all cache keys tagged with product:{id} — the product detail page, the search result snippets, the recommendation widget, the recently-viewed list. Tag-based invalidation is supported natively by Fastly (surrogate keys), Cloudflare (cache tags), and CloudFront (with origin-side logic). For Redis, you can implement it with a reverse index: a set keyed by tag containing all cache keys belonging to that tag.

Here is a complete implementation of event-driven invalidation with Redis keyspace notifications and write-through with cache-aside fallback:

import redis
import json
import hashlib
import time
from typing import Optional, Any, Callable
from dataclasses import dataclass

# Prevents stale repopulation after a write by using
# double-delete: delete before write, delete after write.
# Without this, a concurrent reader can populate the cache
# with the pre-write value between your delete and your DB write.

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def double_delete_write(
    key: str,
    db_write_fn: Callable,
    *args,
    delay_ms: int = 50,
    **kwargs
) -> Any:
    """
    Write-through with double-delete race condition protection.

    Failure mode prevented: without the pre-delete, a reader that
    loaded stale data before your write will repopulate the cache
    with the old value after you delete and re-write.
    """
    # First delete: prevents stale repopulation by in-flight readers
    r.delete(key)

    # Write to the database (source of truth)
    result = db_write_fn(*args, **kwargs)

    # Small delay: lets any concurrent readers that were between
    # the first delete and the DB write complete their round-trip
    # before we delete again. In practice 50ms is sufficient.
    time.sleep(delay_ms / 1000)

    # Second delete: clears anything a concurrent reader populated
    # in the window between first delete and DB write completing
    r.delete(key)

    return result


def cache_aside_read(
    key: str,
    db_read_fn: Callable,
    ttl: int = 300,
    *args,
    **kwargs
) -> Any:
    """
    Cache-aside (lazy loading) with automatic cache population on miss.

    Failure mode: if cache is empty (cold start or after invalidation),
    all concurrent readers hit the DB simultaneously (thundering herd).
    See Section 2 for stampede protection.
    """
    cached = r.get(key)
    if cached is not None:
        return json.loads(cached)

    # Cache miss: load from DB
    value = db_read_fn(*args, **kwargs)

    if value is not None:
        r.setex(key, ttl, json.dumps(value))

    return value


# Event-driven invalidation via Redis Streams
# Subscriber deletes cache keys when write events are published.
# Failure mode: if subscriber is lagging, cache serves stale data.
# Mitigate with maxlen on stream and consumer group with ACK.

def publish_invalidation_event(entity_type: str, entity_id: str):
    """Publish cache invalidation event to Redis Stream."""
    r.xadd(
        "cache:invalidation",
        {
            "entity_type": entity_type,
            "entity_id": entity_id,
            "timestamp": str(time.time()),
        },
        maxlen=10000,  # Prevent unbounded stream growth
    )


def invalidation_subscriber_loop():
    """
    Consumer group subscriber that deletes cache keys on write events.
    Run as a background worker process.
    """
    group = "cache-invalidators"
    stream = "cache:invalidation"

    # Create consumer group (idempotent)
    try:
        r.xgroup_create(stream, group, id="0", mkstream=True)
    except redis.exceptions.ResponseError:
        pass  # Group already exists

    consumer_name = f"worker-{int(time.time())}"

    while True:
        # Read up to 10 events, block 1s if stream is empty
        messages = r.xreadgroup(
            group, consumer_name, {stream: ">"}, count=10, block=1000
        )

        if not messages:
            continue

        for stream_name, entries in messages:
            for entry_id, fields in entries:
                entity_type = fields["entity_type"]
                entity_id = fields["entity_id"]

                # Delete all cache keys for this entity
                pattern = f"{entity_type}:*:{entity_id}:*"
                keys = r.scan_iter(pattern)
                pipe = r.pipeline()
                for key in keys:
                    pipe.delete(key)
                pipe.execute()

                # ACK the message — prevents reprocessing on restart
                r.xack(stream, group, entry_id)
Architecture diagram

sequenceDiagram participant W as Writer participant C as Cache participant DB as Database participant R as Reader rect rgb(255, 220, 220) Note over W,R: WITHOUT double-delete (race condition) R->>C: GET product:42 (miss) W->>C: DELETE product:42 W->>DB: UPDATE product SET price=99 R->>DB: SELECT * FROM product WHERE id=42 (gets OLD value) R->>C: SET product:42 = {price: 79} (stale!) Note over C: Cache now has pre-write value end rect rgb(220, 255, 220) Note over W,R: WITH double-delete (safe) W->>C: DELETE product:42 (1st delete) W->>DB: UPDATE product SET price=99 Note over W: wait 50ms for in-flight readers W->>C: DELETE product:42 (2nd delete) R->>C: GET product:42 (miss) R->>DB: SELECT * FROM product WHERE id=42 (gets NEW value) R->>C: SET product:42 = {price: 99} (fresh) end


2. Cache Key Design

A cache key is a contract. Change the underlying data schema without changing the key and your cache silently serves incorrect data until TTL expires. Design keys with explicit versioning and namespacing from the start — retrofitting key design in production requires a full cache flush.

Key naming convention: {service}:{entity}:{id}:{version}

  • service prevents collisions between microservices sharing a Redis cluster
  • entity is the data type: user, product, session, feed
  • id is the primary identifier for the specific record
  • version is the schema version of the serialized value

Example: catalog:product:8842:v3. When you deploy a schema change that adds a required field, bump to v4. All v3 keys become unreachable immediately on deploy — no stale deserialization errors, no migration script.

Per-user cache keys include the user ID for personalized data: feed:user:7731:v2. This prevents cross-user data leaks and allows per-user invalidation on account update.

The thundering herd problem is what happens when a popular cache key expires. Every instance in your fleet misses simultaneously, issues a database query simultaneously, and you absorb 50x normal database load in a two-second window while all those queries execute and all those instances race to repopulate the same key. At p99, one of those queries is slow. The others pile up. Your database connection pool exhausts. Your application starts returning 500s.

TTL jitter is the cheap fix: instead of a fixed TTL of 300 seconds, use 300 + random.randint(-30, 30). Keys that would have expired together now expire across a 60-second window, spreading the load across 60 database round-trips instead of one synchronized burst. This is standard practice in any system with more than a handful of cache clients.

XFetch (probabilistic early recomputation) is the correct fix for high-traffic keys where even jittered expiry produces unacceptable spikes. The algorithm probabilistically recomputes a cache entry before it expires, based on how expensive the recomputation is and how close the entry is to expiry. Keys with expensive recomputation (slow DB queries, aggregation jobs) are recomputed earlier; cheap keys are refreshed close to their natural expiry. Only one instance recomputes at a time — others continue serving the cached value until the fresh value is available.

import math
import random
import time
import json
from typing import Optional, Callable, Tuple

def build_cache_key(
    service: str,
    entity: str,
    entity_id: str | int,
    version: str = "v1",
    user_id: Optional[str | int] = None,
) -> str:
    """
    Namespaced, versioned cache key builder.

    Versioned keys prevent stale deserialization: bump version on
    schema change and old cached values auto-invalidate on next read.
    """
    parts = [service, entity, str(entity_id), version]
    if user_id is not None:
        parts.insert(3, f"u{user_id}")
    return ":".join(parts)


def ttl_with_jitter(base_ttl: int, jitter_pct: float = 0.1) -> int:
    """
    Add ±jitter_pct random variation to TTL.

    Failure mode prevented: without jitter, all instances that
    cached a popular key at the same time will miss simultaneously,
    creating a synchronized DB load spike (thundering herd).
    """
    jitter = int(base_ttl * jitter_pct)
    return base_ttl + random.randint(-jitter, jitter)


def xfetch_get(
    r: redis.Redis,
    key: str,
    recompute_fn: Callable,
    base_ttl: int,
    beta: float = 1.0,
) -> Any:
    """
    XFetch algorithm: probabilistic early recomputation.

    Prevents thundering herd on high-traffic keys by recomputing
    a single instance's cache early, while all other instances
    continue serving the cached value.

    beta > 1.0: recompute earlier (use for expensive recomputations)
    beta < 1.0: recompute later (use for cheap recomputations)

    Reference: Vattani et al., "Exact analysis of TTL cache networks"
    """
    raw = r.get(key)

    now = time.time()

    if raw is not None:
        data = json.loads(raw)
        ttl_remaining = r.ttl(key)
        delta = data.get("_delta", 1.0)  # Seconds taken to compute last time

        # XFetch decision: probabilistically recompute before expiry
        # Higher delta (expensive computation) → recompute earlier
        # Higher beta → recompute earlier
        should_recompute = (
            now - delta * beta * math.log(random.random())
        ) >= (now + ttl_remaining - base_ttl)

        if not should_recompute:
            return data["value"]

    # Recompute (either cache miss or early recomputation)
    start = time.time()
    value = recompute_fn()
    delta = time.time() - start

    ttl = ttl_with_jitter(base_ttl)
    r.setex(
        key,
        ttl,
        json.dumps({"value": value, "_delta": delta}),
    )

    return value

flowchart TD A[Popular cache key expires\nTTL = 300s, no jitter] --> B{All 50 app instances\nmiss simultaneously} B --> C[50 concurrent DB queries\nfor same row] C --> D[DB connection pool exhausted\nQuery queue backs up] D --> E[p99 query: 800ms\nInstances timeout waiting] E --> F[500 errors\nRetries compound load] style A fill:#ff9999 style F fill:#ff6666 G[Same key with TTL jitter\n300s ± 30s] --> H{Instances expire\nstaggered over 60s window} H --> I[~1 DB query per second\nInstead of 50 simultaneous] I --> J[DB load stays flat\nNo queue buildup] J --> K[p99 stays at 12ms\nNo incidents] style G fill:#99ff99 style K fill:#66cc66


3. Layered Caching Architecture

A single Redis cluster is not a caching architecture. At scale, you need multiple cache layers operating at different latencies and scopes, with explicit rules for consistency and promotion between layers.

L1: In-process cache lives inside your application process. Zero network round-trips — sub-microsecond lookups against an in-memory LRU or bounded hash map. The tradeoff is that L1 is per-instance and not shared: if you have 40 application instances, you have 40 independent L1 caches that can diverge from each other. L1 is appropriate for hot reference data that changes infrequently: config objects, feature flags, permission sets, lookup tables. Bounded size is mandatory — an unbounded L1 cache is a memory leak. Typical size: 1,000-10,000 entries.

L2: Redis cluster is the shared cache tier. Millisecond latency, consistent across all application instances, supports atomic operations. Redis Cluster provides horizontal scaling and fault tolerance. This is where the majority of your application's cache reads should be served from. Hit rate target: 85-95% for well-designed applications.

L3: CDN edge cache eliminates origin hits entirely for cacheable HTTP responses. Requests served from CDN edge nodes never reach your application servers — Cloudflare, Fastly, and CloudFront operate hundreds of Points of Presence globally, serving from the edge node closest to the user. Latency target: under 5ms for cache hits. A well-configured CDN can absorb 90%+ of read traffic for public content.

L4: Origin / database is the source of truth. Every request that reaches L4 represents a failure of the layers above it. Hit rates at L4 should be minimized — target under 5% of all read requests hitting the database for any high-traffic path.

Cache promotion ensures a miss at L1 but a hit at L2 repopulates L1 for subsequent requests. A miss at L2 but a hit at CDN is harder to leverage in server-side caching, but CDN hit data can inform cache warming strategies.

Consistency across layers is where complexity lives. When you invalidate a key in L2, your L1 caches across 40 instances still hold the old value. Options: L1 TTL short enough to self-heal quickly (10-30 seconds), explicit L1 invalidation via a pub/sub channel, or accepting brief L1 divergence for data where eventual consistency is acceptable. For permission and authentication data, accept no divergence: skip L1 entirely or use zero-TTL L1 entries that expire immediately.

import time
import json
from collections import OrderedDict
from typing import Optional, Any, Callable
import redis
import threading

class LRUCache:
    """Thread-safe LRU in-process cache with bounded size."""

    def __init__(self, max_size: int = 1000, default_ttl: int = 30):
        self._cache: OrderedDict = OrderedDict()
        self._max_size = max_size
        self._default_ttl = default_ttl
        self._lock = threading.Lock()

    def get(self, key: str) -> Optional[Any]:
        with self._lock:
            if key not in self._cache:
                return None
            value, expires_at = self._cache[key]
            if time.time() > expires_at:
                del self._cache[key]
                return None
            # Move to end (most recently used)
            self._cache.move_to_end(key)
            return value

    def set(self, key: str, value: Any, ttl: Optional[int] = None):
        with self._lock:
            ttl = ttl or self._default_ttl
            expires_at = time.time() + ttl
            if key in self._cache:
                self._cache.move_to_end(key)
            self._cache[key] = (value, expires_at)
            # Evict least recently used if over capacity
            if len(self._cache) > self._max_size:
                self._cache.popitem(last=False)

    def delete(self, key: str):
        with self._lock:
            self._cache.pop(key, None)


class MultiLayerCache:
    """
    L1 (in-process LRU) → L2 (Redis) → L4 (origin/DB) with
    automatic cache promotion on miss.

    Cache promotion: miss at L1 but hit at L2 repopulates L1
    so subsequent requests from this instance avoid the network.

    Coordinated invalidation: invalidation deletes from both
    L1 and L2 simultaneously. L1 divergence window = L1 TTL max.
    """

    def __init__(
        self,
        redis_client: redis.Redis,
        l1_max_size: int = 1000,
        l1_ttl: int = 30,      # Short: L1 divergence window
        l2_ttl: int = 300,     # Longer: shared cache lifetime
    ):
        self._l1 = LRUCache(max_size=l1_max_size, default_ttl=l1_ttl)
        self._l2 = redis_client
        self._l1_ttl = l1_ttl
        self._l2_ttl = l2_ttl

    def get(
        self,
        key: str,
        origin_fn: Optional[Callable] = None,
    ) -> Optional[Any]:
        # L1 check: zero-latency in-process lookup
        value = self._l1.get(key)
        if value is not None:
            return value

        # L2 check: Redis round-trip (~1ms)
        raw = self._l2.get(key)
        if raw is not None:
            value = json.loads(raw)
            # Cache promotion: populate L1 for subsequent requests
            self._l1.set(key, value, ttl=self._l1_ttl)
            return value

        # L4 miss: load from origin/database
        if origin_fn is None:
            return None

        value = origin_fn()
        if value is not None:
            self._populate(key, value)

        return value

    def _populate(self, key: str, value: Any):
        """Populate both L1 and L2."""
        ttl = ttl_with_jitter(self._l2_ttl)
        self._l2.setex(key, ttl, json.dumps(value))
        self._l1.set(key, value, ttl=self._l1_ttl)

    def invalidate(self, key: str):
        """
        Coordinated invalidation: delete from L1 and L2 simultaneously.

        Failure mode: if you only delete from L2, this instance's L1
        continues serving stale data for up to l1_ttl seconds.
        """
        self._l1.delete(key)
        self._l2.delete(key)

    def invalidate_pattern(self, pattern: str):
        """Invalidate all keys matching a Redis glob pattern."""
        # L2 pattern delete
        keys = list(self._l2.scan_iter(pattern))
        if keys:
            self._l2.delete(*keys)
        # L1: cannot pattern-match, rely on TTL expiry
        # For strict L1 invalidation, maintain a reverse index
Comparison visual

flowchart LR U([User Request]) --> L1 subgraph Application Instance L1[L1: In-Process LRU\nSub-microsecond\n1K-10K entries\nTTL: 30s] end L1 -->|miss| L2 L1 -->|hit| R1([Response]) subgraph Shared Cache L2[L2: Redis Cluster\n~1ms latency\nShared across instances\nTTL: 300s] end L2 -->|promote to L1| L1 L2 -->|hit| R2([Response]) L2 -->|miss| L3 subgraph CDN Edge L3[L3: CDN PoP\nCloudflare / Fastly / CloudFront\nunder 5ms global\nHTTP Cache-Control] end L3 -->|hit| R3([Response]) L3 -->|miss| L4 subgraph Origin L4[(L4: Database\nSource of Truth\nTarget under 5% of reads)] end L4 -->|populate L2, L1| L2 L4 --> R4([Response])


4. CDN Caching Strategy

CDN caching is controlled entirely by HTTP response headers. If you do not explicitly set Cache-Control, your CDN will either cache nothing or cache everything with its default TTL — neither is what you want.

Cache-Control directives that matter in production:

  • max-age=N: client-side TTL in seconds
  • s-maxage=N: CDN-side TTL (overrides max-age for shared caches); use this to set different cache lifetimes for browsers vs CDN
  • stale-while-revalidate=N: serve stale content for N seconds while fetching a fresh version in the background. This is the single most impactful directive for perceived latency — a user never waits for a cache refresh
  • stale-if-error=N: serve stale content for N seconds if the origin returns a 5xx error. This is your CDN-level circuit breaker for origin outages
  • no-store: do not cache under any conditions (authentication pages, payment flows)
  • private: cacheable by browsers but not CDNs

Surrogate keys (cache tags) let you purge all CDN-cached responses related to a piece of content with a single API call. When you update a product, you purge product-8842 and every CDN edge node globally drops all responses tagged with that key — product detail pages, search result snippets, recommendation widgets — regardless of their remaining TTL. Cloudflare calls these Cache Tags. Fastly calls them Surrogate-Keys. CloudFront requires implementing the equivalent with a custom header and Lambda@Edge.

The Vary header instructs the CDN to cache separate response copies for different request header values. Vary: Accept-Encoding is standard (gzip vs brotli). Vary: Accept-Language creates per-language cache buckets. Avoid Vary: Cookie or Vary: Authorization — these make the vast majority of responses uncacheable at the CDN since nearly every authenticated user sends a unique cookie.

CDN origin shield (called Origin Shield in CloudFront, Shielding in Fastly, Tiered Cache in Cloudflare) collapses all edge cache misses through a single intermediate node before reaching your origin. Without origin shield, a cold cache across 250 edge nodes means 250 simultaneous origin requests for the same content. With origin shield, those 250 edge nodes coalesce into one origin request. Required for any CDN purge event — the moment you invalidate a popular content tag, origin shield prevents the invalidation from becoming a traffic spike.

from fastapi import FastAPI, Request, Response
from fastapi.responses import JSONResponse
import httpx
import hashlib
import json
import time
from typing import Optional
import redis

app = FastAPI()
r = redis.Redis(host="localhost", port=6379, decode_responses=True)

CLOUDFLARE_ZONE_ID = "your-zone-id"
CLOUDFLARE_API_TOKEN = "your-api-token"


def set_cache_headers(
    response: Response,
    max_age: int = 60,
    s_maxage: int = 300,
    stale_while_revalidate: int = 60,
    stale_if_error: int = 86400,
    cache_tags: Optional[list[str]] = None,
):
    """
    Set Cache-Control and CDN cache tag headers.

    s_maxage > max_age: CDN holds content longer than browsers,
    preventing origin requests while allowing browser refresh.

    stale-while-revalidate: users never wait for cache refresh,
    background revalidation happens asynchronously.

    stale-if-error: CDN serves stale content during origin outages
    — your circuit breaker at the edge.
    """
    directives = [
        f"public",
        f"max-age={max_age}",
        f"s-maxage={s_maxage}",
        f"stale-while-revalidate={stale_while_revalidate}",
        f"stale-if-error={stale_if_error}",
    ]
    response.headers["Cache-Control"] = ", ".join(directives)

    if cache_tags:
        # Cloudflare: Cache-Tag header (comma-separated)
        response.headers["Cache-Tag"] = ",".join(cache_tags)
        # Fastly: Surrogate-Key header (space-separated)
        response.headers["Surrogate-Key"] = " ".join(cache_tags)


@app.get("/api/products/{product_id}")
async def get_product(product_id: int, response: Response):
    """
    Product endpoint with multi-layer caching and CDN cache tags.
    Cache tags enable targeted purge on product update without
    flushing the entire CDN cache.
    """
    cache_key = build_cache_key("catalog", "product", product_id, "v2")

    # Try Redis first
    cached = r.get(cache_key)
    if cached:
        product = json.loads(cached)
    else:
        # Load from DB (simulated)
        product = {"id": product_id, "name": "Widget", "price": 99.99}
        r.setex(cache_key, ttl_with_jitter(300), json.dumps(product))

    set_cache_headers(
        response,
        max_age=60,
        s_maxage=300,
        stale_while_revalidate=60,
        stale_if_error=86400,
        cache_tags=[f"product-{product_id}", "products"],
    )

    return product


async def purge_cloudflare_cache_tags(tags: list[str]):
    """
    Programmatic CDN purge via Cloudflare API on content update.

    Call this after every product write so CDN-cached pages
    immediately reflect the new state. Without purge, users see
    stale CDN responses for up to s_maxage seconds.
    """
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"https://api.cloudflare.com/client/v4/zones/{CLOUDFLARE_ZONE_ID}/purge_cache",
            headers={
                "Authorization": f"Bearer {CLOUDFLARE_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={"tags": tags},
        )
        resp.raise_for_status()
        return resp.json()


@app.put("/api/products/{product_id}")
async def update_product(product_id: int, data: dict):
    """
    Write path: update DB, invalidate Redis, purge CDN.
    Three-layer invalidation: Redis (immediate) + CDN (programmatic purge).
    """
    # DB write (simulated)
    # await db.execute("UPDATE products SET ... WHERE id = ?", ...)

    # Redis invalidation with double-delete
    cache_key = build_cache_key("catalog", "product", product_id, "v2")
    double_delete_write(
        cache_key,
        lambda: None,  # DB write already done above
    )

    # CDN purge: clears all edge-cached responses tagged with this product
    await purge_cloudflare_cache_tags([
        f"product-{product_id}",
    ])

    return {"status": "updated", "id": product_id}

5. Distributed Cache Patterns for Correctness

Performance is the headline, but correctness is the real requirement. A cache that serves wrong data is worse than no cache.

Read-your-writes consistency is the failure mode that frustrates users most visibly. A user posts a comment. The write succeeds. They reload their feed. The comment is not there — it is in the database, but the cached feed snapshot has not been refreshed yet. From the user's perspective, their action had no effect.

The solution is a short-circuit bypass: after a write, mark the user's session as "recently wrote" with a very short TTL (5-10 seconds). On subsequent reads within that window, bypass the cache and read directly from the database primary. After the window closes, resume normal cache-served reads. This adds negligible overhead — the bypass window is short, and most users are not writing continuously.

Negative caching prevents database hammering on missing keys. Without it, every request for a non-existent user ID (common in scraping, enumeration attempts, and cache stampedes after a delete) hits the database. Cache a sentinel value for not-found results with a short TTL (30-60 seconds). The cache returns the sentinel, your application interprets it as a miss, and the database is protected.

import hashlib
import time
from typing import Optional, Any, Callable
import redis
import json

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

NEGATIVE_SENTINEL = "__NOT_FOUND__"
NEGATIVE_TTL = 60  # Cache not-found for 60s; prevents DB hammering


def cache_with_negative(
    key: str,
    db_read_fn: Callable,
    positive_ttl: int = 300,
) -> Optional[Any]:
    """
    Cache-aside with negative caching.

    Failure mode prevented: without negative caching, every request
    for a deleted or non-existent record hits the database.
    Common in scraping attacks and after delete operations.
    """
    cached = r.get(key)

    if cached == NEGATIVE_SENTINEL:
        return None  # Known not-found, skip DB entirely

    if cached is not None:
        return json.loads(cached)

    value = db_read_fn()

    if value is None:
        # Cache the not-found result with short TTL
        r.setex(key, NEGATIVE_TTL, NEGATIVE_SENTINEL)
        return None

    r.setex(key, ttl_with_jitter(positive_ttl), json.dumps(value))
    return value


def generate_etag(content: Any) -> str:
    """Generate ETag from content hash for conditional GET."""
    content_bytes = json.dumps(content, sort_keys=True).encode()
    return hashlib.sha256(content_bytes).hexdigest()[:16]


from fastapi import FastAPI, Request, Response
from fastapi.responses import JSONResponse

app = FastAPI()


@app.get("/api/articles/{article_id}")
async def get_article(article_id: int, request: Request, response: Response):
    """
    Conditional GET with ETag and 304 Not Modified.

    Reduces bandwidth: client caches response + ETag, sends
    If-None-Match on subsequent requests. Server returns 304
    if content unchanged — no body transmitted.

    Combine with Redis: ETag stored alongside content,
    check ETag before serializing full response body.
    """
    cache_key = build_cache_key("content", "article", article_id, "v1")
    etag_key = f"{cache_key}:etag"

    # Load content (from cache or DB)
    content = cache_aside_read(
        cache_key,
        lambda: {"id": article_id, "title": "Article", "body": "..."},
    )

    if content is None:
        return JSONResponse({"error": "Not found"}, status_code=404)

    current_etag = r.get(etag_key)
    if current_etag is None:
        current_etag = generate_etag(content)
        r.setex(etag_key, 300, current_etag)

    # Check If-None-Match: return 304 if client has current version
    client_etag = request.headers.get("if-none-match")
    if client_etag and client_etag == f'"{current_etag}"':
        return Response(status_code=304)

    response.headers["ETag"] = f'"{current_etag}"'
    response.headers["Cache-Control"] = "public, max-age=60, s-maxage=300"

    return content


def sticky_read_bypass(
    user_id: str,
    cache_key: str,
    db_read_fn: Callable,
    bypass_ttl: int = 10,
    cache_ttl: int = 300,
) -> Any:
    """
    Read-your-writes: bypass cache for users who recently wrote.

    Failure mode prevented: user writes data, immediately reads
    their feed/profile, cache returns pre-write state — user
    thinks their write was lost.
    """
    bypass_key = f"bypass:{user_id}"

    if r.exists(bypass_key):
        # User recently wrote — read from DB primary directly
        return db_read_fn()

    return cache_aside_read(cache_key, db_read_fn, ttl=cache_ttl)


def mark_user_wrote(user_id: str, bypass_ttl: int = 10):
    """
    Call after any write by user_id to activate bypass window.
    Expires automatically after bypass_ttl seconds.
    """
    r.setex(f"bypass:{user_id}", bypass_ttl, "1")

6. Monitoring and Debugging Cache Behavior

A cache you cannot observe is a cache you cannot trust. Hit rate drops before incidents — instrument early.

Redis INFO stats provide cluster-wide metrics: keyspace_hits, keyspace_misses, evicted_keys, expired_keys, used_memory, connected_clients. Calculate hit rate as hits / (hits + misses). Target: above 85% for general application caches, above 95% for high-traffic public APIs. A hit rate drop from 92% to 78% on a Tuesday afternoon is a signal, not noise — investigate what changed.

Eviction monitoring tells you when your cache is under memory pressure. When Redis reaches maxmemory, it applies its eviction policy (allkeys-lru, volatile-lru, allkeys-random, etc.). Evictions are visible in evicted_keys from INFO. If eviction rate is non-zero, your cache is too small for your working set — either increase memory, reduce key sizes, or lower TTLs on lower-priority keys. allkeys-lru is the right policy for most application caches: evict the least recently used key regardless of TTL.

Key-level inspection:
- TTL key returns remaining TTL in seconds (-1 = no expiry, -2 = key does not exist)
- DEBUG OBJECT key returns serialized length, encoding, and LRU idle time
- OBJECT ENCODING key shows memory encoding: ziplist/listpack for small hashes (compact), hashtable for large ones (more memory)
- OBJECT FREQ key (requires maxmemory-policy lfu) shows access frequency

Latency percentiles are the most actionable metric for cache health. Measure p50, p95, and p99 for cache reads (Redis round-trip) vs database reads. Target: Redis p99 under 5ms, database p99 under 50ms for indexed reads. A Redis p99 spike to 50ms is usually a network issue or a large key serialization bottleneck.

Cache poisoning detection: if your application deserializes cached values without validation, a compromised Redis node or a serialization bug can inject malformed data. Store a checksum alongside the cached value and verify on read. Discard and reload from DB on checksum mismatch — this converts a poisoning event into a cache miss rather than a corrupted read.

Alerting thresholds to configure:
- Hit rate < 80%: alert immediately, investigate DB load
- Eviction rate > 100 keys/sec: investigate memory pressure
- Redis p99 latency > 10ms: investigate network or large key sizes
- keyspace_misses spike: correlate with deployment events (schema version change causes full miss)
- connected_clients near maxclients (default 10,000): connection leak or pool misconfiguration

import redis
import time
from typing import Dict

r = redis.Redis(host="localhost", port=6379, decode_responses=True)


def get_cache_stats() -> Dict:
    """
    Pull key cache health metrics from Redis INFO.
    Returns hit_rate, eviction_rate, memory_usage_pct.
    """
    info = r.info()
    stats = r.info("stats")
    memory = r.info("memory")

    hits = stats.get("keyspace_hits", 0)
    misses = stats.get("keyspace_misses", 0)
    total = hits + misses

    hit_rate = (hits / total * 100) if total > 0 else 0

    used_memory = memory.get("used_memory", 0)
    max_memory = memory.get("maxmemory", 0)
    memory_pct = (used_memory / max_memory * 100) if max_memory > 0 else 0

    return {
        "hit_rate_pct": round(hit_rate, 2),
        "keyspace_hits": hits,
        "keyspace_misses": misses,
        "evicted_keys": stats.get("evicted_keys", 0),
        "expired_keys": stats.get("expired_keys", 0),
        "used_memory_mb": round(used_memory / 1024 / 1024, 1),
        "memory_usage_pct": round(memory_pct, 2),
        "connected_clients": info.get("connected_clients", 0),
    }


def inspect_key(key: str) -> Dict:
    """
    Inspect a specific cache key for TTL, encoding, and memory usage.
    Use DEBUG OBJECT to identify large keys that inflate memory or
    increase serialization latency.
    """
    ttl = r.ttl(key)
    encoding = r.object_encoding(key)

    try:
        debug_obj = r.debug_object(key)
    except Exception:
        debug_obj = {}

    return {
        "key": key,
        "ttl_seconds": ttl,
        "encoding": encoding,
        "serialized_length_bytes": debug_obj.get("serializedlength"),
        "lru_idle_seconds": debug_obj.get("lru_seconds_idle"),
    }


def check_cache_health(hit_rate_threshold: float = 80.0) -> Dict:
    """
    Health check function for monitoring integration (Datadog, Prometheus).
    Returns status=WARN or CRITICAL with actionable diagnostics.
    """
    stats = get_cache_stats()
    warnings = []

    if stats["hit_rate_pct"] < hit_rate_threshold:
        warnings.append(
            f"Hit rate {stats['hit_rate_pct']}% below threshold "
            f"{hit_rate_threshold}% — check DB load"
        )

    if stats["memory_usage_pct"] > 85:
        warnings.append(
            f"Memory at {stats['memory_usage_pct']}% — "
            f"increase maxmemory or audit key sizes"
        )

    status = "OK" if not warnings else "WARN"

    return {"status": status, "stats": stats, "warnings": warnings}

Conclusion

Cache invalidation is a consistency problem that presents as a performance problem. When your cache is lying — serving stale prices, outdated permissions, deleted content — the debugging path is long because the symptoms (wrong data, user complaints) look nothing like the cause (a race condition between a write and a read that occurs in a 50-millisecond window at peak traffic).

The patterns in this post map to specific failure modes. Double-delete prevents stale repopulation from concurrent readers. TTL jitter and XFetch prevent thundering herds on key expiry. Layered caching with cache promotion keeps hit rates above 90% while containing the footprint of any single layer's inconsistency. CDN cache tags with programmatic purge prevent content updates from being invisible at the edge for minutes or hours. Negative caching stops database hammering from non-existent key lookups. Read-your-writes bypass prevents users from losing confidence in your application's responsiveness.

The production numbers that matter: hit rate above 85% for Redis (above 95% for high-traffic public APIs), Redis p99 under 5ms, CDN serving 80%+ of public read traffic, database receiving under 5% of total reads. If your numbers are below these targets, the gap is almost always one of the patterns above — not hardware or infrastructure.

Start with correct key design, add TTL jitter from day one, implement double-delete on any write path that has concurrent readers, and instrument hit rate before you need to debug it. The cache that does not lie is not one that never has misses — it is one where every miss is intentional and every hit is fresh.


Sources

  • Vattani et al., "Exact Analysis of TTL Cache Networks" (2015) — XFetch probabilistic early recomputation
  • Cloudflare Cache documentation — Cache-Tag header and surrogate key purge API
  • Fastly Surrogate-Keys documentation — Tag-based cache purge at CDN layer
  • Redis Documentation — keyspace notifications, INFO stats, DEBUG OBJECT, maxmemory policies
  • AWS CloudFront Origin Shield documentation — collapsing edge misses to single origin request

Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.

Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

What Is Voice AI? TTS, STT, and Voice Agents Explained