Redis and Valkey in Production 2026: Caching, Pub/Sub, Streams, and Vector Search

Redis has been the default in-memory data store for a decade. Then in 2024, Redis changed its license to SSPL — a source-available license that blocks cloud providers from offering it as a managed service. The community forked it. Valkey, now under the Linux Foundation, is that fork. In 2026, both exist in production at scale, and understanding when to use each — and how to use either effectively — is a core backend skill.
This guide covers the five primary Redis/Valkey use cases: caching with proper eviction strategies, session storage, pub/sub messaging, Streams for event processing, and vector search for AI applications. Each section includes production-ready patterns and the failure modes that bite teams most often.
The Licensing Split and What It Means in 2026
Before the patterns: understanding the ecosystem split matters for architectural decisions.
Redis Ltd. (the company) continues developing Redis under SSPL for the open-source release. Redis Stack — which bundles vector search, JSON, time series, and graph modules — is also available. AWS, Google Cloud, and Azure can no longer offer vanilla Redis; they either built their own (ElastiCache now uses Valkey internally) or partner with Redis Ltd.
Valkey is a drop-in compatible fork from Redis 7.2. The API is identical. Client libraries work without modification. Valkey 8.0 introduced improvements in multi-threaded I/O that Redis hadn't shipped yet. For teams without Redis Enterprise licenses, Valkey is now the default recommendation for new self-hosted deployments.
Practical decision: If you're on managed cloud (ElastiCache, Cloud Memorystore, Azure Cache), you're likely already on Valkey without knowing it. If you're self-hosted and don't need Redis Stack's commercial modules, Valkey is the better choice. If you need RedisSearch or RedisJSON with commercial support, Redis Enterprise makes sense.
The code in this guide works identically on both.
The Problem: Why In-Memory Data Stores Matter
A typical database-backed API response takes 20-100ms. With Redis/Valkey caching, the same read takes 0.1-1ms — a 50-100× speedup. But the impact goes beyond speed:
Without caching: 1,000 req/s × 50ms DB query = 50 DB connections sustained With caching: 1,000 req/s × 95% cache hit = 50 DB queries/s, 0.95ms median latency
The math is stark. But caching is where most production incidents live. The failure modes — thundering herd, cache stampede, stale reads, cold start — aren't theoretical. They've taken down production systems at every scale.
graph LR
A[Client Request] --> B{Cache Hit?}
B -- Yes 0.1-1ms --> C[Return Cached Data]
B -- No 20-100ms --> D[Query Database]
D --> E[Write to Cache]
E --> F[Return Data]
style C fill:#22c55e,color:#fff
style D fill:#ef4444,color:#fff
style E fill:#3b82f6,color:#fff
Understanding Redis/Valkey means understanding both the happy path and the failure modes.
How It Works: Data Structures as the Core Abstraction
Unlike a traditional cache that stores only strings, Redis/Valkey is a data structure server. The structure you choose determines what operations are available and their complexity:
| Structure | Use Case | Key Operations | Complexity |
|-----------|----------|----------------|------------|
| String | Cache, counters, flags | GET/SET, INCR, GETSET | O(1) |
| Hash | Object storage, sessions | HGET/HSET, HGETALL | O(1)/O(N) |
| List | Queues, timelines | LPUSH/RPOP, LRANGE | O(1)/O(N) |
| Set | Unique collections, tags | SADD, SISMEMBER, SINTER | O(1)/O(N) |
| Sorted Set | Leaderboards, rate limits | ZADD, ZRANGE, ZRANGEBYSCORE | O(log N) |
| Stream | Event log, message queue | XADD, XREAD, XGROUP | O(1)/O(N) |
| HyperLogLog | Unique visitor counts | PFADD, PFCOUNT | O(1), ~0.81% error |
This matters: if you're storing a user session as a JSON blob in a String, you're deserializing the full object every time you need one field. A Hash stores it as named fields — you can HGET user:123 email without touching the rest.
Implementation: Caching Patterns That Don't Bite You
Pattern 1: Cache-Aside with Stampede Prevention
The naive cache-aside pattern has a race condition under load: when the cache expires, hundreds of requests can simultaneously hit the database.
import redis
import json
import hashlib
import time
from functools import wraps
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def cached(key_prefix: str, ttl: int = 300, lock_ttl: int = 5):
"""
Cache-aside decorator with probabilistic early expiration
to prevent cache stampede.
Uses 'XFetch' algorithm: probabilistically refresh before TTL expires
based on recompute time and time-to-live remaining.
"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
# Build cache key from prefix + hashed args
args_hash = hashlib.md5(
json.dumps([args, kwargs], sort_keys=True).encode()
).hexdigest()[:8]
cache_key = f"{key_prefix}:{args_hash}"
delta_key = f"{cache_key}:delta"
# Check cache
cached_value = r.get(cache_key)
if cached_value:
delta = float(r.get(delta_key) or 1.0)
ttl_remaining = r.ttl(cache_key)
# XFetch: probabilistic early expiration
# Prevents stampede by having early requesters refresh
import random
if -delta * 1.5 * random.random() * abs(ttl_remaining) < 0:
return json.loads(cached_value)
return json.loads(cached_value)
# Cache miss — compute with timing
start = time.time()
result = func(*args, **kwargs)
elapsed = time.time() - start
# Store result + recompute time
pipe = r.pipeline()
pipe.setex(cache_key, ttl, json.dumps(result))
pipe.setex(delta_key, ttl, elapsed)
pipe.execute()
return result
return wrapper
return decorator
@cached(key_prefix="user_profile", ttl=300)
def get_user_profile(user_id: int) -> dict:
# Simulates expensive DB query
return db.query("SELECT * FROM users WHERE id = %s", user_id)
Pattern 2: Write-Through for Sessions
Session data is written often but must always be fresh. Write-through keeps cache and store in sync:
class SessionManager:
"""
Write-through session cache using Redis Hashes.
Sessions stored as flat key-value pairs — no full serialization on partial updates.
"""
SESSION_TTL = 3600 * 24 * 7 # 7 days
def __init__(self, redis_client: redis.Redis):
self.r = redis_client
def _key(self, session_id: str) -> str:
return f"session:{session_id}"
def create(self, session_id: str, user_id: int, metadata: dict) -> None:
key = self._key(session_id)
pipe = self.r.pipeline()
pipe.hset(key, mapping={
"user_id": user_id,
"created_at": int(time.time()),
"last_seen": int(time.time()),
**{f"meta:{k}": v for k, v in metadata.items()}
})
pipe.expire(key, self.SESSION_TTL)
pipe.execute()
def touch(self, session_id: str) -> bool:
"""Refresh TTL and update last_seen on each request."""
key = self._key(session_id)
pipe = self.r.pipeline()
pipe.hset(key, "last_seen", int(time.time()))
pipe.expire(key, self.SESSION_TTL)
results = pipe.execute()
return bool(results[0]) # False if session doesn't exist
def get(self, session_id: str) -> dict | None:
data = self.r.hgetall(self._key(session_id))
if not data:
return None
return {
"user_id": int(data["user_id"]),
"created_at": int(data["created_at"]),
"last_seen": int(data["last_seen"]),
"meta": {
k[5:]: v for k, v in data.items()
if k.startswith("meta:")
}
}
def invalidate(self, session_id: str) -> None:
self.r.delete(self._key(session_id))
Pattern 3: Rate Limiting with Sliding Window
Token bucket and fixed-window rate limiting have known edge cases. Sliding window using a Sorted Set is more accurate:
def sliding_window_rate_limit(
user_id: str,
max_requests: int,
window_seconds: int
) -> tuple[bool, int]:
"""
Sliding window rate limiter using Sorted Set.
Returns (allowed: bool, requests_remaining: int)
"""
key = f"rate_limit:{user_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
# Remove requests outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in window
pipe.zcard(key)
# Add this request
pipe.zadd(key, {str(now): now})
# Set key expiration
pipe.expire(key, window_seconds)
results = pipe.execute()
current_count = results[1]
if current_count >= max_requests:
# Remove the request we just added — it's denied
r.zrem(key, str(now))
return False, 0
return True, max_requests - current_count - 1
Pub/Sub and Streams: When to Use Which
Redis/Valkey offers two messaging primitives. They serve different purposes and teams often choose the wrong one:
flowchart TD
Q{Need message history?}
Q -- No --> P[Pub/Sub]
Q -- Yes --> S[Streams]
P --> P1[Fire-and-forget notifications]
P --> P2[Real-time broadcast]
P --> P3[No persistence, no replay]
S --> S1[Event log with consumer groups]
S --> S2[At-least-once delivery]
S --> S3[Consumer lag tracking]
S --> S4[Persistent history]
style P fill:#f59e0b,color:#fff
style S fill:#3b82f6,color:#fff
Pub/Sub is fire-and-forget. If no subscriber is connected when a message arrives, the message is lost. Use it for real-time notifications (presence updates, live dashboard refresh) where missing a message is acceptable.
Streams are a persistent append-only log with consumer groups. Messages survive disconnections. Consumer groups track which messages each consumer has processed. Use Streams when you need reliable delivery: order processing, audit logs, event-driven workflows.
# Pub/Sub — real-time presence notifications
import asyncio
import redis.asyncio as aioredis
async def publish_presence(user_id: int, status: str):
async with aioredis.Redis() as r:
await r.publish(
channel=f"presence:updates",
message=json.dumps({"user_id": user_id, "status": status, "ts": time.time()})
)
async def subscribe_presence(callback):
async with aioredis.Redis() as r:
async with r.pubsub() as ps:
await ps.subscribe("presence:updates")
async for message in ps.listen():
if message["type"] == "message":
await callback(json.loads(message["data"]))
# Streams — reliable order event processing
def process_order_stream():
"""
Consumer group processing for order events.
Handles failures: unacknowledged messages are retried.
"""
GROUP = "order-processor"
CONSUMER = f"worker-{os.getpid()}"
STREAM = "orders:events"
# Create consumer group (idempotent)
try:
r.xgroup_create(STREAM, GROUP, id="0", mkstream=True)
except redis.exceptions.ResponseError:
pass # Group already exists
while True:
# Read new messages (> means from last delivered)
messages = r.xreadgroup(
groupname=GROUP,
consumername=CONSUMER,
streams={STREAM: ">"},
count=10,
block=2000 # Block 2s waiting for messages
)
for stream_name, stream_messages in (messages or []):
for msg_id, fields in stream_messages:
try:
process_order(fields)
r.xack(STREAM, GROUP, msg_id)
except Exception as e:
# Leave unacked — will be reclaimed after PEL timeout
log.error(f"Failed to process {msg_id}: {e}")
# Reclaim stuck messages from dead consumers (pending > 30s)
pending = r.xautoclaim(STREAM, GROUP, CONSUMER, min_idle_time=30000, start_id="0-0")
for msg_id, fields in pending[1]:
try:
process_order(fields)
r.xack(STREAM, GROUP, msg_id)
except Exception:
pass
Vector Search: Redis/Valkey as an AI Memory Store
Redis Stack (Redis) and the RedisVSS module (community) add vector search. For AI applications that need low-latency similarity search — agent memory, recommendation systems, semantic cache — storing vectors in Redis avoids a separate vector database:
from redis.commands.search.field import VectorField, TextField, NumericField
from redis.commands.search.query import Query
import numpy as np
def setup_vector_index(r: redis.Redis):
"""Create HNSW vector index for document embeddings."""
try:
r.ft("doc_idx").dropindex(delete_documents=False)
except:
pass
schema = [
TextField("$.content", as_name="content"),
NumericField("$.user_id", as_name="user_id"),
VectorField(
"$.embedding",
"HNSW", # or "FLAT" for brute-force (better for < 50k vectors)
{
"TYPE": "FLOAT32",
"DIM": 1536, # OpenAI ada-002 dimension
"DISTANCE_METRIC": "COSINE",
"M": 16, # HNSW connections per node
"EF_CONSTRUCTION": 200 # Build-time search width
},
as_name="embedding"
)
]
r.ft("doc_idx").create_index(
schema,
definition=redis.commands.search.IndexDefinition(
prefix=["doc:"], index_type=redis.commands.search.IndexType.JSON
)
)
def semantic_cache_get(query_embedding: list[float], threshold: float = 0.95) -> str | None:
"""
Check if a semantically similar query was already answered.
Avoids LLM API calls for near-duplicate questions.
"""
query_vec = np.array(query_embedding, dtype=np.float32).tobytes()
q = (
Query("*=>[KNN 1 @embedding $vec AS score]")
.sort_by("score")
.return_fields("content", "score")
.dialect(2)
)
results = r.ft("semantic_cache_idx").search(q, query_params={"vec": query_vec})
if results.docs and float(results.docs[0].score) >= threshold:
return results.docs[0].content # Cache hit — return cached LLM response
return None # Cache miss — call LLM
This pattern — semantic caching — is particularly valuable for LLM applications. Instead of checking for exact string matches, you find questions with the same semantic meaning. A 95% cosine similarity threshold catches questions like "What's the capital of France?" and "What city is the capital of France?" as the same query.
Distributed Locks with Redlock
When multiple processes or servers need exclusive access to a resource — running a cron job exactly once, preventing double-processing of a payment — Redis/Valkey distributed locks are the standard solution.
The naïve approach of SET lock_key 1 EX 30 has a race condition: if the process crashes after acquiring the lock but before releasing it, the key expires after 30 seconds and another process takes over. That's fine. But if the process is slow (GC pause, slow network) and the key expires while it's still holding the lock, two processes both believe they hold the lock simultaneously.
Redlock is the multi-node algorithm designed to prevent this:
import redis
import uuid
import time
from contextlib import contextmanager
class RedLock:
"""
Distributed lock using Redlock algorithm across multiple Redis nodes.
Requires majority of N nodes to acquire the lock.
For single-node deployments, use simple SET NX EX instead.
"""
QUORUM = 2 # Majority of 3 nodes
CLOCK_DRIFT_FACTOR = 0.01
def __init__(self, redis_nodes: list[redis.Redis], ttl_ms: int = 10000):
self.nodes = redis_nodes
self.ttl_ms = ttl_ms
def _acquire_on_node(self, node: redis.Redis, resource: str, token: str) -> bool:
"""Try to acquire lock on a single node using SET NX PX."""
return bool(node.set(resource, token, nx=True, px=self.ttl_ms))
def _release_on_node(self, node: redis.Redis, resource: str, token: str) -> None:
"""Release lock only if we own it (Lua for atomicity)."""
release_script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
"""
node.eval(release_script, 1, resource, token)
@contextmanager
def lock(self, resource: str):
token = str(uuid.uuid4())
start_time = time.time()
# Try to acquire on all nodes
acquired = sum(1 for node in self.nodes if self._acquire_on_node(node, resource, token))
# Calculate elapsed time and validity
elapsed_ms = (time.time() - start_time) * 1000
validity_ms = self.ttl_ms - elapsed_ms - (self.ttl_ms * self.CLOCK_DRIFT_FACTOR)
if acquired >= self.QUORUM and validity_ms > 0:
try:
yield validity_ms # Lock acquired — validity time remaining
finally:
for node in self.nodes:
self._release_on_node(node, resource, token)
else:
# Failed — release any partial acquisitions
for node in self.nodes:
self._release_on_node(node, resource, token)
raise RuntimeError(f"Could not acquire lock for {resource}")
# Usage
redis_nodes = [
redis.Redis(host="redis-1", port=6379),
redis.Redis(host="redis-2", port=6379),
redis.Redis(host="redis-3", port=6379),
]
lock = RedLock(redis_nodes, ttl_ms=5000)
with lock.lock("payment:process:order-12345"):
# Only one process across all servers executes this
process_payment(order_id=12345)
When to use Redlock vs simple lock: Redlock is necessary only when you're running Redis in a multi-primary cluster and can't tolerate a primary failure causing two processes to hold the same lock. For most applications — with a single primary and one or more replicas — a simple SET key value NX PX ttl is sufficient. The fencing token pattern (increment a counter on each lock acquisition; include it in all database writes) is more robust than Redlock for preventing stale lock issues.
Key Naming and Data Organization at Scale
With hundreds of developers writing to Redis/Valkey, key naming becomes critical for operations — debugging, memory analysis, and access control all depend on predictable key structure.
# Convention: {namespace}:{entity_type}:{identifier}:{field}
# Good examples:
user:session:abc123 # User session by session ID
rate_limit:api:user:456:v1 # Rate limit for user 456 on API v1
cache:product:789:price # Cached price for product 789
lock:payment:order:12345 # Distributed lock for order processing
queue:email:notifications # Email notification queue
# Bad examples (no structure → impossible to debug or scan selectively):
user123session
ratelimit_user_456
product789
The namespace prefix enables selective operations. When you need to flush all cache keys without touching session keys:
# Scan + delete by prefix (safe for production — uses SCAN not KEYS) redis-cli --scan --pattern "cache:*" | xargs redis-cli del
Use redis-cli --bigkeys to find memory hogs, and redis-cli --memkeys for per-key memory analysis. These are the first tools to reach for when a Redis instance is approaching its memory limit and you don't know why.
Production Considerations
Memory Management and Eviction
Redis/Valkey is bounded by available memory. Without a proper eviction policy, memory fills up and writes fail. Configure this before any other production setting:
# redis.conf / valkey.conf maxmemory 4gb maxmemory-policy allkeys-lru # Evict least-recently-used keys across all keyspaces # Alternatives: # volatile-lru — only evict keys with TTL set (default) # allkeys-lru — evict any key, LRU order (recommended for pure cache) # allkeys-lfu — evict least-frequently-used (better for hot-key patterns) # volatile-ttl — evict keys closest to expiration # noeviction — reject writes when full (good for session stores, bad for caches)
For mixed workloads (sessions + cache), use separate Redis/Valkey instances with different eviction policies: noeviction for session data (losing sessions is worse than slowness), allkeys-lfu for cache.
graph TD
subgraph "Production Redis/Valkey Topology"
A[Application Servers] --> B[Redis Cluster / Sentinel]
B --> C[Primary Node]
B --> D[Replica 1]
B --> E[Replica 2]
C -- Async replication --> D
C -- Async replication --> E
F[Cache Instance\nallkeys-lfu\n4GB]
G[Session Instance\nnoeviction\n8GB]
A --> F
A --> G
end
Connection Pooling
Each Redis connection is a TCP socket + server-side memory (~20KB). At scale, connection exhaustion is a real failure mode. Use pooling everywhere:
# Production connection pool setup
pool = redis.ConnectionPool(
host="redis.internal",
port=6379,
max_connections=50, # Per process
socket_timeout=0.5, # 500ms — fail fast
socket_connect_timeout=1.0,
retry_on_timeout=True,
health_check_interval=30,
)
r = redis.Redis(connection_pool=pool)
# For async (aioredis)
async_pool = redis.asyncio.ConnectionPool(
host="redis.internal",
max_connections=100, # Higher per process for async
socket_timeout=0.5,
)
Cluster vs Sentinel vs Standalone
| Mode | Use Case | Failover | Sharding |
|------|----------|----------|---------|
| Standalone | Dev, low-traffic | Manual | No |
| Sentinel | Primary-replica with auto-failover | ~30s | No |
| Cluster | Horizontal scaling, high availability | ~5s | Yes (16384 slots) |
Cluster mode shards data across nodes automatically. The cost: multi-key commands (MSET, pipeline across different keys) only work if all keys hash to the same slot. Use {key} hash tags to force related keys to the same slot: session:{user:123}:token and session:{user:123}:data will always be on the same node.
Persistence Configuration
Redis/Valkey is in-memory but supports persistence. Get this wrong and you lose data on restart:
# RDB: point-in-time snapshots (fast recovery, up to minutes of data loss) save 900 1 # After 900s if at least 1 key changed save 300 10 # After 300s if at least 10 keys changed save 60 10000 # After 60s if at least 10000 keys changed # AOF: append-only file (up to 1s of data loss with fsync=everysec) appendonly yes appendfsync everysec # fsync every second — balance of safety/performance # appendfsync always # fsync on every write — safest, slowest # appendfsync no # OS decides — fastest, most data loss risk # AOF rewrite keeps AOF file from growing forever auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb
For a cache-only deployment: disable persistence entirely (save "", no AOF). Data loss on restart is acceptable and persistence overhead hurts performance. For session stores: AOF with everysec gives up to 1 second of loss risk with minimal overhead.
Conclusion
Redis/Valkey is one of those tools where the gap between "works in dev" and "works in production" is large. The data structures are simple to learn; the failure modes — stampedes, connection exhaustion, wrong eviction policies, replication lag during failover — take time and incidents to learn.
The patterns in this guide are production-hardened:
- XFetch algorithm prevents stampede without distributed locks
- Hash-based sessions avoid unnecessary serialization
- Sorted Set rate limiting gives accurate sliding windows
- Streams with consumer groups give at-least-once delivery guarantees
- Semantic caching cuts LLM API costs by 20-40% for similar queries
The Valkey fork means the ecosystem continues regardless of Redis Ltd.'s licensing direction. The API is stable. The patterns transfer. Pick the deployment that matches your operational model and apply the patterns above.
Sources
- Valkey Linux Foundation announcement and 8.0 release notes
- Redis SSPL licensing change (March 2024)
- Antirez: "Redis persistence demystified"
- XFetch algorithm: "Optimal Probabilistic Cache Stampede Prevention"
- Redis documentation: Streams, Cluster, Pub/Sub
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment