Redis and Valkey in Production 2026: Caching, Pub/Sub, Streams, and Vector Search

Hero: Redis data structure diagram with latency counters

Redis has been the default in-memory data store for a decade. Then in 2024, Redis changed its license to SSPL — a source-available license that blocks cloud providers from offering it as a managed service. The community forked it. Valkey, now under the Linux Foundation, is that fork. In 2026, both exist in production at scale, and understanding when to use each — and how to use either effectively — is a core backend skill.

This guide covers the five primary Redis/Valkey use cases: caching with proper eviction strategies, session storage, pub/sub messaging, Streams for event processing, and vector search for AI applications. Each section includes production-ready patterns and the failure modes that bite teams most often.

The Licensing Split and What It Means in 2026

Before the patterns: understanding the ecosystem split matters for architectural decisions.

Redis Ltd. (the company) continues developing Redis under SSPL for the open-source release. Redis Stack — which bundles vector search, JSON, time series, and graph modules — is also available. AWS, Google Cloud, and Azure can no longer offer vanilla Redis; they either built their own (ElastiCache now uses Valkey internally) or partner with Redis Ltd.

Valkey is a drop-in compatible fork from Redis 7.2. The API is identical. Client libraries work without modification. Valkey 8.0 introduced improvements in multi-threaded I/O that Redis hadn't shipped yet. For teams without Redis Enterprise licenses, Valkey is now the default recommendation for new self-hosted deployments.

Practical decision: If you're on managed cloud (ElastiCache, Cloud Memorystore, Azure Cache), you're likely already on Valkey without knowing it. If you're self-hosted and don't need Redis Stack's commercial modules, Valkey is the better choice. If you need RedisSearch or RedisJSON with commercial support, Redis Enterprise makes sense.

The code in this guide works identically on both.

The Problem: Why In-Memory Data Stores Matter

A typical database-backed API response takes 20-100ms. With Redis/Valkey caching, the same read takes 0.1-1ms — a 50-100× speedup. But the impact goes beyond speed:

Without caching: 1,000 req/s × 50ms DB query = 50 DB connections sustained
With caching: 1,000 req/s × 95% cache hit = 50 DB queries/s, 0.95ms median latency

The math is stark. But caching is where most production incidents live. The failure modes — thundering herd, cache stampede, stale reads, cold start — aren't theoretical. They've taken down production systems at every scale.

graph LR
    A[Client Request] --> B{Cache Hit?}
    B -- Yes 0.1-1ms --> C[Return Cached Data]
    B -- No 20-100ms --> D[Query Database]
    D --> E[Write to Cache]
    E --> F[Return Data]
    
    style C fill:#22c55e,color:#fff
    style D fill:#ef4444,color:#fff
    style E fill:#3b82f6,color:#fff

Understanding Redis/Valkey means understanding both the happy path and the failure modes.

How It Works: Data Structures as the Core Abstraction

Unlike a traditional cache that stores only strings, Redis/Valkey is a data structure server. The structure you choose determines what operations are available and their complexity:

| Structure | Use Case | Key Operations | Complexity |

|-----------|----------|----------------|------------|

| String | Cache, counters, flags | GET/SET, INCR, GETSET | O(1) |

| Hash | Object storage, sessions | HGET/HSET, HGETALL | O(1)/O(N) |

| List | Queues, timelines | LPUSH/RPOP, LRANGE | O(1)/O(N) |

| Set | Unique collections, tags | SADD, SISMEMBER, SINTER | O(1)/O(N) |

| Sorted Set | Leaderboards, rate limits | ZADD, ZRANGE, ZRANGEBYSCORE | O(log N) |

| Stream | Event log, message queue | XADD, XREAD, XGROUP | O(1)/O(N) |

| HyperLogLog | Unique visitor counts | PFADD, PFCOUNT | O(1), ~0.81% error |

This matters: if you're storing a user session as a JSON blob in a String, you're deserializing the full object every time you need one field. A Hash stores it as named fields — you can HGET user:123 email without touching the rest.

Implementation: Caching Patterns That Don't Bite You

Pattern 1: Cache-Aside with Stampede Prevention

The naive cache-aside pattern has a race condition under load: when the cache expires, hundreds of requests can simultaneously hit the database.

import redis
import json
import hashlib
import time
from functools import wraps

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def cached(key_prefix: str, ttl: int = 300, lock_ttl: int = 5):
    """
    Cache-aside decorator with probabilistic early expiration
    to prevent cache stampede.
    
    Uses 'XFetch' algorithm: probabilistically refresh before TTL expires
    based on recompute time and time-to-live remaining.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Build cache key from prefix + hashed args
            args_hash = hashlib.md5(
                json.dumps([args, kwargs], sort_keys=True).encode()
            ).hexdigest()[:8]
            cache_key = f"{key_prefix}:{args_hash}"
            delta_key = f"{cache_key}:delta"
            
            # Check cache
            cached_value = r.get(cache_key)
            if cached_value:
                delta = float(r.get(delta_key) or 1.0)
                ttl_remaining = r.ttl(cache_key)
                
                # XFetch: probabilistic early expiration
                # Prevents stampede by having early requesters refresh
                import random
                if -delta * 1.5 * random.random() * abs(ttl_remaining) < 0:
                    return json.loads(cached_value)
                    
                return json.loads(cached_value)
            
            # Cache miss — compute with timing
            start = time.time()
            result = func(*args, **kwargs)
            elapsed = time.time() - start
            
            # Store result + recompute time
            pipe = r.pipeline()
            pipe.setex(cache_key, ttl, json.dumps(result))
            pipe.setex(delta_key, ttl, elapsed)
            pipe.execute()
            
            return result
        return wrapper
    return decorator


@cached(key_prefix="user_profile", ttl=300)
def get_user_profile(user_id: int) -> dict:
    # Simulates expensive DB query
    return db.query("SELECT * FROM users WHERE id = %s", user_id)

Pattern 2: Write-Through for Sessions

Session data is written often but must always be fresh. Write-through keeps cache and store in sync:

class SessionManager:
    """
    Write-through session cache using Redis Hashes.
    Sessions stored as flat key-value pairs — no full serialization on partial updates.
    """
    SESSION_TTL = 3600 * 24 * 7  # 7 days
    
    def __init__(self, redis_client: redis.Redis):
        self.r = redis_client
    
    def _key(self, session_id: str) -> str:
        return f"session:{session_id}"
    
    def create(self, session_id: str, user_id: int, metadata: dict) -> None:
        key = self._key(session_id)
        pipe = self.r.pipeline()
        pipe.hset(key, mapping={
            "user_id": user_id,
            "created_at": int(time.time()),
            "last_seen": int(time.time()),
            **{f"meta:{k}": v for k, v in metadata.items()}
        })
        pipe.expire(key, self.SESSION_TTL)
        pipe.execute()
    
    def touch(self, session_id: str) -> bool:
        """Refresh TTL and update last_seen on each request."""
        key = self._key(session_id)
        pipe = self.r.pipeline()
        pipe.hset(key, "last_seen", int(time.time()))
        pipe.expire(key, self.SESSION_TTL)
        results = pipe.execute()
        return bool(results[0])  # False if session doesn't exist
    
    def get(self, session_id: str) -> dict | None:
        data = self.r.hgetall(self._key(session_id))
        if not data:
            return None
        return {
            "user_id": int(data["user_id"]),
            "created_at": int(data["created_at"]),
            "last_seen": int(data["last_seen"]),
            "meta": {
                k[5:]: v for k, v in data.items() 
                if k.startswith("meta:")
            }
        }
    
    def invalidate(self, session_id: str) -> None:
        self.r.delete(self._key(session_id))

Pattern 3: Rate Limiting with Sliding Window

Token bucket and fixed-window rate limiting have known edge cases. Sliding window using a Sorted Set is more accurate:

def sliding_window_rate_limit(
    user_id: str, 
    max_requests: int, 
    window_seconds: int
) -> tuple[bool, int]:
    """
    Sliding window rate limiter using Sorted Set.
    Returns (allowed: bool, requests_remaining: int)
    """
    key = f"rate_limit:{user_id}"
    now = time.time()
    window_start = now - window_seconds
    
    pipe = r.pipeline()
    # Remove requests outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count requests in window
    pipe.zcard(key)
    # Add this request
    pipe.zadd(key, {str(now): now})
    # Set key expiration
    pipe.expire(key, window_seconds)
    
    results = pipe.execute()
    current_count = results[1]
    
    if current_count >= max_requests:
        # Remove the request we just added — it's denied
        r.zrem(key, str(now))
        return False, 0
    
    return True, max_requests - current_count - 1

Pub/Sub and Streams: When to Use Which

Redis/Valkey offers two messaging primitives. They serve different purposes and teams often choose the wrong one:

flowchart TD
    Q{Need message history?}
    Q -- No --> P[Pub/Sub]
    Q -- Yes --> S[Streams]
    P --> P1[Fire-and-forget notifications]
    P --> P2[Real-time broadcast]
    P --> P3[No persistence, no replay]
    S --> S1[Event log with consumer groups]
    S --> S2[At-least-once delivery]
    S --> S3[Consumer lag tracking]
    S --> S4[Persistent history]
    
    style P fill:#f59e0b,color:#fff
    style S fill:#3b82f6,color:#fff

Pub/Sub is fire-and-forget. If no subscriber is connected when a message arrives, the message is lost. Use it for real-time notifications (presence updates, live dashboard refresh) where missing a message is acceptable.

Streams are a persistent append-only log with consumer groups. Messages survive disconnections. Consumer groups track which messages each consumer has processed. Use Streams when you need reliable delivery: order processing, audit logs, event-driven workflows.

# Pub/Sub — real-time presence notifications
import asyncio
import redis.asyncio as aioredis

async def publish_presence(user_id: int, status: str):
    async with aioredis.Redis() as r:
        await r.publish(
            channel=f"presence:updates",
            message=json.dumps({"user_id": user_id, "status": status, "ts": time.time()})
        )

async def subscribe_presence(callback):
    async with aioredis.Redis() as r:
        async with r.pubsub() as ps:
            await ps.subscribe("presence:updates")
            async for message in ps.listen():
                if message["type"] == "message":
                    await callback(json.loads(message["data"]))


# Streams — reliable order event processing
def process_order_stream():
    """
    Consumer group processing for order events.
    Handles failures: unacknowledged messages are retried.
    """
    GROUP = "order-processor"
    CONSUMER = f"worker-{os.getpid()}"
    STREAM = "orders:events"
    
    # Create consumer group (idempotent)
    try:
        r.xgroup_create(STREAM, GROUP, id="0", mkstream=True)
    except redis.exceptions.ResponseError:
        pass  # Group already exists
    
    while True:
        # Read new messages (> means from last delivered)
        messages = r.xreadgroup(
            groupname=GROUP,
            consumername=CONSUMER,
            streams={STREAM: ">"},
            count=10,
            block=2000  # Block 2s waiting for messages
        )
        
        for stream_name, stream_messages in (messages or []):
            for msg_id, fields in stream_messages:
                try:
                    process_order(fields)
                    r.xack(STREAM, GROUP, msg_id)
                except Exception as e:
                    # Leave unacked — will be reclaimed after PEL timeout
                    log.error(f"Failed to process {msg_id}: {e}")
        
        # Reclaim stuck messages from dead consumers (pending > 30s)
        pending = r.xautoclaim(STREAM, GROUP, CONSUMER, min_idle_time=30000, start_id="0-0")
        for msg_id, fields in pending[1]:
            try:
                process_order(fields)
                r.xack(STREAM, GROUP, msg_id)
            except Exception:
                pass

Vector Search: Redis/Valkey as an AI Memory Store

Redis Stack (Redis) and the RedisVSS module (community) add vector search. For AI applications that need low-latency similarity search — agent memory, recommendation systems, semantic cache — storing vectors in Redis avoids a separate vector database:

from redis.commands.search.field import VectorField, TextField, NumericField
from redis.commands.search.query import Query
import numpy as np

def setup_vector_index(r: redis.Redis):
    """Create HNSW vector index for document embeddings."""
    try:
        r.ft("doc_idx").dropindex(delete_documents=False)
    except:
        pass
    
    schema = [
        TextField("$.content", as_name="content"),
        NumericField("$.user_id", as_name="user_id"),
        VectorField(
            "$.embedding",
            "HNSW",  # or "FLAT" for brute-force (better for < 50k vectors)
            {
                "TYPE": "FLOAT32",
                "DIM": 1536,           # OpenAI ada-002 dimension
                "DISTANCE_METRIC": "COSINE",
                "M": 16,               # HNSW connections per node
                "EF_CONSTRUCTION": 200 # Build-time search width
            },
            as_name="embedding"
        )
    ]
    
    r.ft("doc_idx").create_index(
        schema,
        definition=redis.commands.search.IndexDefinition(
            prefix=["doc:"], index_type=redis.commands.search.IndexType.JSON
        )
    )


def semantic_cache_get(query_embedding: list[float], threshold: float = 0.95) -> str | None:
    """
    Check if a semantically similar query was already answered.
    Avoids LLM API calls for near-duplicate questions.
    """
    query_vec = np.array(query_embedding, dtype=np.float32).tobytes()
    
    q = (
        Query("*=>[KNN 1 @embedding $vec AS score]")
        .sort_by("score")
        .return_fields("content", "score")
        .dialect(2)
    )
    
    results = r.ft("semantic_cache_idx").search(q, query_params={"vec": query_vec})
    
    if results.docs and float(results.docs[0].score) >= threshold:
        return results.docs[0].content  # Cache hit — return cached LLM response
    
    return None  # Cache miss — call LLM

This pattern — semantic caching — is particularly valuable for LLM applications. Instead of checking for exact string matches, you find questions with the same semantic meaning. A 95% cosine similarity threshold catches questions like "What's the capital of France?" and "What city is the capital of France?" as the same query.

Distributed Locks with Redlock

When multiple processes or servers need exclusive access to a resource — running a cron job exactly once, preventing double-processing of a payment — Redis/Valkey distributed locks are the standard solution.

The naïve approach of SET lock_key 1 EX 30 has a race condition: if the process crashes after acquiring the lock but before releasing it, the key expires after 30 seconds and another process takes over. That's fine. But if the process is slow (GC pause, slow network) and the key expires while it's still holding the lock, two processes both believe they hold the lock simultaneously.

Redlock is the multi-node algorithm designed to prevent this:

import redis
import uuid
import time
from contextlib import contextmanager

class RedLock:
    """
    Distributed lock using Redlock algorithm across multiple Redis nodes.
    Requires majority of N nodes to acquire the lock.
    
    For single-node deployments, use simple SET NX EX instead.
    """
    QUORUM = 2  # Majority of 3 nodes
    CLOCK_DRIFT_FACTOR = 0.01
    
    def __init__(self, redis_nodes: list[redis.Redis], ttl_ms: int = 10000):
        self.nodes = redis_nodes
        self.ttl_ms = ttl_ms
    
    def _acquire_on_node(self, node: redis.Redis, resource: str, token: str) -> bool:
        """Try to acquire lock on a single node using SET NX PX."""
        return bool(node.set(resource, token, nx=True, px=self.ttl_ms))
    
    def _release_on_node(self, node: redis.Redis, resource: str, token: str) -> None:
        """Release lock only if we own it (Lua for atomicity)."""
        release_script = """
        if redis.call('get', KEYS[1]) == ARGV[1] then
            return redis.call('del', KEYS[1])
        else
            return 0
        end
        """
        node.eval(release_script, 1, resource, token)
    
    @contextmanager
    def lock(self, resource: str):
        token = str(uuid.uuid4())
        start_time = time.time()
        
        # Try to acquire on all nodes
        acquired = sum(1 for node in self.nodes if self._acquire_on_node(node, resource, token))
        
        # Calculate elapsed time and validity
        elapsed_ms = (time.time() - start_time) * 1000
        validity_ms = self.ttl_ms - elapsed_ms - (self.ttl_ms * self.CLOCK_DRIFT_FACTOR)
        
        if acquired >= self.QUORUM and validity_ms > 0:
            try:
                yield validity_ms  # Lock acquired — validity time remaining
            finally:
                for node in self.nodes:
                    self._release_on_node(node, resource, token)
        else:
            # Failed — release any partial acquisitions
            for node in self.nodes:
                self._release_on_node(node, resource, token)
            raise RuntimeError(f"Could not acquire lock for {resource}")


# Usage
redis_nodes = [
    redis.Redis(host="redis-1", port=6379),
    redis.Redis(host="redis-2", port=6379),
    redis.Redis(host="redis-3", port=6379),
]

lock = RedLock(redis_nodes, ttl_ms=5000)

with lock.lock("payment:process:order-12345"):
    # Only one process across all servers executes this
    process_payment(order_id=12345)

When to use Redlock vs simple lock: Redlock is necessary only when you're running Redis in a multi-primary cluster and can't tolerate a primary failure causing two processes to hold the same lock. For most applications — with a single primary and one or more replicas — a simple SET key value NX PX ttl is sufficient. The fencing token pattern (increment a counter on each lock acquisition; include it in all database writes) is more robust than Redlock for preventing stale lock issues.

Key Naming and Data Organization at Scale

With hundreds of developers writing to Redis/Valkey, key naming becomes critical for operations — debugging, memory analysis, and access control all depend on predictable key structure.

# Convention: {namespace}:{entity_type}:{identifier}:{field}

# Good examples:
user:session:abc123              # User session by session ID
rate_limit:api:user:456:v1       # Rate limit for user 456 on API v1
cache:product:789:price          # Cached price for product 789
lock:payment:order:12345         # Distributed lock for order processing
queue:email:notifications        # Email notification queue

# Bad examples (no structure → impossible to debug or scan selectively):
user123session
ratelimit_user_456
product789

The namespace prefix enables selective operations. When you need to flush all cache keys without touching session keys:

# Scan + delete by prefix (safe for production — uses SCAN not KEYS)
redis-cli --scan --pattern "cache:*" | xargs redis-cli del

Use redis-cli --bigkeys to find memory hogs, and redis-cli --memkeys for per-key memory analysis. These are the first tools to reach for when a Redis instance is approaching its memory limit and you don't know why.

Production Considerations

Memory Management and Eviction

Redis/Valkey is bounded by available memory. Without a proper eviction policy, memory fills up and writes fail. Configure this before any other production setting:

# redis.conf / valkey.conf
maxmemory 4gb
maxmemory-policy allkeys-lru    # Evict least-recently-used keys across all keyspaces

# Alternatives:
# volatile-lru     — only evict keys with TTL set (default)
# allkeys-lru      — evict any key, LRU order (recommended for pure cache)
# allkeys-lfu      — evict least-frequently-used (better for hot-key patterns)
# volatile-ttl     — evict keys closest to expiration
# noeviction       — reject writes when full (good for session stores, bad for caches)

For mixed workloads (sessions + cache), use separate Redis/Valkey instances with different eviction policies: noeviction for session data (losing sessions is worse than slowness), allkeys-lfu for cache.

graph TD
    subgraph "Production Redis/Valkey Topology"
        A[Application Servers] --> B[Redis Cluster / Sentinel]
        B --> C[Primary Node]
        B --> D[Replica 1]
        B --> E[Replica 2]
        C -- Async replication --> D
        C -- Async replication --> E
        F[Cache Instance\nallkeys-lfu\n4GB] 
        G[Session Instance\nnoeviction\n8GB]
        A --> F
        A --> G
    end

Connection Pooling

Each Redis connection is a TCP socket + server-side memory (~20KB). At scale, connection exhaustion is a real failure mode. Use pooling everywhere:

# Production connection pool setup
pool = redis.ConnectionPool(
    host="redis.internal",
    port=6379,
    max_connections=50,         # Per process
    socket_timeout=0.5,         # 500ms — fail fast
    socket_connect_timeout=1.0,
    retry_on_timeout=True,
    health_check_interval=30,
)

r = redis.Redis(connection_pool=pool)

# For async (aioredis)
async_pool = redis.asyncio.ConnectionPool(
    host="redis.internal",
    max_connections=100,         # Higher per process for async
    socket_timeout=0.5,
)

Cluster vs Sentinel vs Standalone

| Mode | Use Case | Failover | Sharding |

|------|----------|----------|---------|

| Standalone | Dev, low-traffic | Manual | No |

| Sentinel | Primary-replica with auto-failover | ~30s | No |

| Cluster | Horizontal scaling, high availability | ~5s | Yes (16384 slots) |

Cluster mode shards data across nodes automatically. The cost: multi-key commands (MSET, pipeline across different keys) only work if all keys hash to the same slot. Use {key} hash tags to force related keys to the same slot: session:{user:123}:token and session:{user:123}:data will always be on the same node.

Persistence Configuration

Redis/Valkey is in-memory but supports persistence. Get this wrong and you lose data on restart:

# RDB: point-in-time snapshots (fast recovery, up to minutes of data loss)
save 900 1        # After 900s if at least 1 key changed
save 300 10       # After 300s if at least 10 keys changed  
save 60 10000     # After 60s if at least 10000 keys changed

# AOF: append-only file (up to 1s of data loss with fsync=everysec)
appendonly yes
appendfsync everysec   # fsync every second — balance of safety/performance
# appendfsync always   # fsync on every write — safest, slowest
# appendfsync no       # OS decides — fastest, most data loss risk

# AOF rewrite keeps AOF file from growing forever
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

For a cache-only deployment: disable persistence entirely (save "", no AOF). Data loss on restart is acceptable and persistence overhead hurts performance. For session stores: AOF with everysec gives up to 1 second of loss risk with minimal overhead.

Conclusion

Redis/Valkey is one of those tools where the gap between "works in dev" and "works in production" is large. The data structures are simple to learn; the failure modes — stampedes, connection exhaustion, wrong eviction policies, replication lag during failover — take time and incidents to learn.

The patterns in this guide are production-hardened:

  • XFetch algorithm prevents stampede without distributed locks
  • Hash-based sessions avoid unnecessary serialization
  • Sorted Set rate limiting gives accurate sliding windows
  • Streams with consumer groups give at-least-once delivery guarantees
  • Semantic caching cuts LLM API costs by 20-40% for similar queries

The Valkey fork means the ecosystem continues regardless of Redis Ltd.'s licensing direction. The API is stable. The patterns transfer. Pick the deployment that matches your operational model and apply the patterns above.

Sources

  • Valkey Linux Foundation announcement and 8.0 release notes
  • Redis SSPL licensing change (March 2024)
  • Antirez: "Redis persistence demystified"
  • XFetch algorithm: "Optimal Probabilistic Cache Stampede Prevention"
  • Redis documentation: Streams, Cluster, Pub/Sub

Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.

Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

What Is Voice AI? TTS, STT, and Voice Agents Explained