Redis Advanced Patterns in 2026: Streams, Pub/Sub, Lua Scripts, and Vector Search
Redis Advanced Patterns in 2026: Streams, Pub/Sub, Lua Scripts, and Vector Search

Introduction
Redis ships with a reputation it only partially deserves. Most teams use it as a dumb key-value store — SET key value EX 300, GET key, done. That pattern is useful, but it barely scratches what Redis can do. The same process that holds your session tokens also gives you a persistent, consumer-group-aware message log, a pub/sub broadcast bus, an atomic scripting engine, and — with Redis Stack — a vector database capable of sub-millisecond approximate nearest neighbor search across millions of embeddings.
The version most production systems are running today — Redis 7.x and Redis Stack 2.x — is a fundamentally different beast than the Redis from five years ago. Streams, introduced in Redis 5.0, are now mature enough to replace Kafka for the majority of workloads that don't need multi-day retention or petabyte-scale throughput. RediSearch's vector search, available in Redis Stack, has gone from an experiment to a serious alternative to Pinecone and Weaviate for teams that want to avoid managing a separate vector store. Lua scripting has always been there, but most developers still reach for MULTI/EXEC pipelines when a well-written Lua script would eliminate race conditions entirely.
This post covers the patterns that graduate Redis from "fast cache" to "production workhorse": advanced caching with stampede prevention, Streams for durable event processing with consumer groups and dead-letter queues, Pub/Sub for real-time fan-out, Lua scripts for atomic compound operations, vector search for semantic similarity workloads, and Cluster mode for high availability at scale. Every code example is complete and production-ready. Where Redis competes with specialized tools — Kafka, RabbitMQ, Pinecone — the tradeoffs are explicit.
1. Advanced Caching Patterns
The SET key value EX ttl pattern handles maybe 60 percent of caching use cases. The remaining 40 percent — write-through, write-behind, stampede prevention, layered caches, per-datatype TTL strategies — is where caching actually gets interesting and where naive implementations silently degrade under load.
Cache Strategies
Cache-aside (lazy loading) is the most common pattern: the application checks the cache on read, populates it on miss. Simple to reason about, easy to implement, but it means the first request after a cache miss hits the database. Under traffic spikes, dozens or hundreds of requests can all miss simultaneously on the same key and all hit the database in parallel — this is a cache stampede.
Write-through updates the cache synchronously on every write. Cache and database are always in sync. The cost: every write pays the penalty of two writes (cache + DB), and you cache data that may never be read.
Write-behind (write-back) writes to the cache first and flushes to the database asynchronously. Dramatically reduces write latency, but risks data loss if Redis restarts before the flush. Appropriate for metrics accumulation or click counters; inappropriate for financial records.
XFetch: Probabilistic Early Expiration
Cache stampede is a deceptively hard problem. The naive fix — a distributed lock that forces only one caller to recompute while others wait — adds latency and creates its own contention. The XFetch algorithm from Vattani, Chierichetti, and Lowenstein (2015) solves this without locks: it probabilistically recomputes the cache early, before expiration, based on how expensive the recomputation is and how close the TTL is.
The probability of early recomputation grows as the key approaches expiration. Expensive recomputations (high beta) trigger early refresh sooner. The result: the cache is refreshed in the background before it expires, and stampedes never happen.
# cache_xfetch.py
import redis
import time
import math
import random
import json
from typing import Callable, Any, Optional
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
def xfetch(
key: str,
ttl: int,
recompute: Callable[[], Any],
beta: float = 1.0,
) -> Any:
"""
XFetch: probabilistic early expiration to prevent cache stampedes.
Args:
key: Redis key to cache under
ttl: Desired TTL in seconds
recompute: Callable that fetches the canonical value (DB query, API call, etc.)
beta: Recomputation cost factor. Higher = recompute sooner.
1.0 is a sensible default. Set higher for expensive recomputations.
Returns:
Cached or freshly computed value.
"""
# Fetch current cached value and its remaining TTL atomically
pipe = r.pipeline(transaction=False)
pipe.get(key)
pipe.ttl(key)
cached_raw, remaining_ttl = pipe.execute()
if cached_raw is not None:
cached = json.loads(cached_raw)
expiry = time.time() + remaining_ttl # absolute expiry time
# XFetch early-expiration check:
# Recompute early with probability proportional to how close we are to expiry
# and how expensive the recomputation is (delta * beta).
delta = ttl - remaining_ttl # time elapsed since last refresh
if delta <= 0:
delta = 1 # guard against zero/negative delta on fresh keys
# The longer a recomputation takes (larger delta) and the closer
# the key is to expiry, the more likely we are to refresh now.
jitter = -beta * delta * math.log(random.random())
if time.time() + jitter >= expiry:
# Probabilistically decided to refresh early — recompute and cache
value = recompute()
r.set(key, json.dumps(value), ex=ttl)
return value
return cached
# Cache miss: recompute and cache
value = recompute()
r.set(key, json.dumps(value), ex=ttl)
return value
# --- Layered cache: L1 (local dict) → L2 (Redis) → L3 (database) ---
import functools
from collections import OrderedDict
class LRUCache:
"""Minimal in-process LRU cache for L1 layer."""
def __init__(self, maxsize: int = 512):
self.cache: OrderedDict = OrderedDict()
self.maxsize = maxsize
def get(self, key: str) -> Optional[Any]:
if key in self.cache:
self.cache.move_to_end(key)
return self.cache[key]
return None
def set(self, key: str, value: Any) -> None:
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.maxsize:
self.cache.popitem(last=False)
l1 = LRUCache(maxsize=512)
def get_with_layered_cache(
key: str,
db_fetch: Callable[[], Any],
l2_ttl: int = 300,
) -> Any:
"""
Three-layer cache lookup:
L1 → in-process LRU dict (microseconds)
L2 → Redis (sub-millisecond)
L3 → database (milliseconds to seconds)
"""
# L1 check
value = l1.get(key)
if value is not None:
return value
# L2 check (Redis), with stampede protection
value = xfetch(key, ttl=l2_ttl, recompute=db_fetch)
# Populate L1
l1.set(key, value)
return value
TTL Strategies by Data Volatility
TTL is not one-size-fits-all. A sensible tiering:
| Data type | TTL | Rationale |
|---|---|---|
| User session | 30 minutes (sliding) | Active sessions stay warm; idle sessions expire |
| Product catalog | 10 minutes | Changes rarely; stale for a few minutes is acceptable |
| User profile | 5 minutes | Changes infrequently; short TTL keeps data fresh |
| Real-time prices | 5 seconds | Data staleness is a business risk |
| Feature flags | 60 seconds | Need to propagate quickly after changes |
| Computed aggregates | 30 minutes | Expensive to recompute; tolerate some staleness |

2. Redis Streams for Event Processing
Redis Streams, stable since Redis 5.0 and hardened through 7.x, is a persistent, append-only log with consumer group semantics. It is not a replacement for Kafka at petabyte scale or with multi-day retention requirements. It is a serious replacement for Kafka in the 95 percent of systems where throughput stays under 1 million events per day, retention is hours-to-days rather than weeks-to-months, and the operational cost of running a Kafka cluster (Zookeeper or KRaft, broker replication, topic management, consumer group lag monitoring) is not justified.
A single Redis node benchmarks at over 1 million XADD operations per second. With consumer groups, you get competing consumers, at-least-once delivery semantics, and a built-in pending entries list (PEL) that tracks unacknowledged messages. This is Kafka-lite without the JVM.
Core Commands
XADD stream * field value [field value ...]— append entry, auto-generate IDXREAD COUNT n STREAMS stream 0— read from beginningXREADGROUP GROUP grp consumer COUNT n STREAMS stream >— read undelivered messages to this groupXACK stream grp id— acknowledge processing completeXPENDING stream grp - + n— list unacknowledged messagesXCLAIM stream grp consumer min-idle-ms id— reassign a stale pending message
Full Producer + Consumer Group with DLQ
// streams-consumer.js — Node.js (ioredis)
import Redis from "ioredis";
const redis = new Redis({ host: "localhost", port: 6379 });
const STREAM = "events:orders";
const GROUP = "order-processor";
const DLQ_STREAM = "events:orders:dlq";
const MAX_RETRIES = 3;
const CLAIM_IDLE_MS = 30_000; // reclaim messages idle > 30s
// --- Setup ---
async function ensureConsumerGroup() {
try {
// MKSTREAM creates the stream if it doesn't exist
await redis.xgroup("CREATE", STREAM, GROUP, "$", "MKSTREAM");
console.log(`Consumer group '${GROUP}' created`);
} catch (err) {
if (!err.message.includes("BUSYGROUP")) throw err;
// Group already exists — fine
}
}
// --- Producer ---
async function produce(order) {
const id = await redis.xadd(
STREAM,
"*", // auto-generate ID (timestamp-based)
"order_id", order.id,
"customer", order.customer,
"amount", String(order.amount),
"payload", JSON.stringify(order),
);
console.log(`Produced message ${id}`);
return id;
}
// --- Process one message ---
async function processMessage(id, fields) {
// fields comes back as flat array: [key, val, key, val, ...]
const data = {};
for (let i = 0; i < fields.length; i += 2) {
data[fields[i]] = fields[i + 1];
}
console.log(`Processing order ${data.order_id} for ${data.customer}`);
// Simulate processing (replace with real business logic)
if (Math.random() < 0.1) {
throw new Error(`Simulated failure for order ${data.order_id}`);
}
console.log(`Order ${data.order_id} processed successfully`);
}
// --- Move to DLQ ---
async function sendToDLQ(id, fields, reason) {
await redis.xadd(
DLQ_STREAM,
"*",
"original_id", id,
"reason", reason,
"failed_at", String(Date.now()),
...fields,
);
console.warn(`Message ${id} sent to DLQ: ${reason}`);
}
// --- Consumer loop ---
async function runConsumer(consumerName) {
await ensureConsumerGroup();
console.log(`Consumer '${consumerName}' starting`);
while (true) {
// 1. Check for stale pending messages (unacked for > CLAIM_IDLE_MS)
const pending = await redis.xpending(
STREAM, GROUP, "-", "+", 10
);
for (const entry of pending) {
const [msgId, owner, idleMs, deliveryCount] = entry;
if (idleMs > CLAIM_IDLE_MS) {
if (deliveryCount >= MAX_RETRIES) {
// Exceeded retry limit → DLQ
const claimed = await redis.xclaim(
STREAM, GROUP, consumerName, CLAIM_IDLE_MS, msgId
);
if (claimed.length > 0) {
const [claimedId, claimedFields] = claimed[0];
await sendToDLQ(claimedId, claimedFields, `Max retries (${MAX_RETRIES}) exceeded`);
await redis.xack(STREAM, GROUP, claimedId);
}
} else {
// Reclaim and retry
await redis.xclaim(STREAM, GROUP, consumerName, CLAIM_IDLE_MS, msgId);
console.log(`Reclaimed stale message ${msgId} (attempt ${deliveryCount + 1})`);
}
}
}
// 2. Read new messages (> means "only undelivered to this group")
const results = await redis.xreadgroup(
"GROUP", GROUP,
consumerName,
"COUNT", "10",
"BLOCK", "2000", // block up to 2 seconds waiting for new messages
"STREAMS", STREAM,
">",
);
if (!results) continue; // timeout with no messages — loop back
for (const [_stream, messages] of results) {
for (const [id, fields] of messages) {
try {
await processMessage(id, fields);
await redis.xack(STREAM, GROUP, id); // ack only on success
} catch (err) {
// Don't ack — leave in PEL for retry via XCLAIM loop above
console.error(`Failed to process ${id}: ${err.message}`);
}
}
}
}
}
// --- Entry point ---
const consumerName = process.argv[2] || "consumer-1";
runConsumer(consumerName).catch(console.error);
Redis Streams vs Kafka: When to Choose Which
| Factor | Redis Streams | Kafka |
|---|---|---|
| Throughput | 1M+ msg/s single node | 10M+ msg/s multi-broker |
| Retention | Hours to days (memory-backed) | Weeks to months (disk) |
| Operational cost | Zero — already running Redis | Significant (brokers, ZK/KRaft) |
| Consumer groups | Yes | Yes |
| Replay | Yes (XRANGE from any ID) | Yes |
| Schema registry | No | Confluent Schema Registry |
| Exactly-once | No (at-least-once) | Yes (with transactions) |
Choose Redis Streams when: throughput is under 1M events/day, retention under 48 hours, you already run Redis, and exactly-once semantics are not required. Choose Kafka when: throughput exceeds what a single Redis node handles, you need multi-week retention, or exactly-once delivery is a hard requirement.
3. Pub/Sub and Real-Time Patterns
Redis Pub/Sub and Redis Streams are complements, not alternatives. The key distinction: Pub/Sub is fire-and-forget — messages published to a channel are delivered only to subscribers active at that moment and are not persisted. Streams are persistent logs. A subscriber that goes offline for five seconds during a Streams workload misses nothing; a subscriber that goes offline for five seconds during a Pub/Sub workload misses everything published in that window.
Use Pub/Sub for: live notifications, presence indicators, real-time dashboards, and any pattern where a momentary gap is acceptable. Use Streams for: anything requiring guaranteed delivery or replay.
SUBSCRIBE, PUBLISH, PSUBSCRIBE
# pubsub_demo.py
import redis
import threading
import time
r_pub = redis.Redis(host="localhost", port=6379, decode_responses=True)
r_sub = redis.Redis(host="localhost", port=6379, decode_responses=True)
def subscriber():
pubsub = r_sub.pubsub()
# Subscribe to exact channel
pubsub.subscribe("notifications:global")
# Pattern subscription — catches notifications:user:*, notifications:team:*, etc.
pubsub.psubscribe("notifications:*")
for message in pubsub.listen():
if message["type"] in ("message", "pmessage"):
print(f"[{message['channel']}] {message['data']}")
def publisher():
time.sleep(0.5) # Let subscriber connect
r_pub.publish("notifications:global", "System maintenance at 22:00 UTC")
r_pub.publish("notifications:user:42", "Your export is ready")
r_pub.publish("notifications:team:engineering", "Deploy window open")
t = threading.Thread(target=subscriber, daemon=True)
t.start()
publisher()
time.sleep(1)
Keyspace Notifications
Keyspace notifications let you subscribe to Redis key lifecycle events — expiry, deletion, set operations — without polling. Enable them in redis.conf or at runtime:
# Enable expired + generic key events
redis-cli CONFIG SET notify-keyspace-events "Ex"
# Watch for key expiry events
pubsub = r_sub.pubsub()
pubsub.psubscribe("__keyevent@0__:expired")
for message in pubsub.listen():
if message["type"] == "pmessage":
expired_key = message["data"]
print(f"Key expired: {expired_key}")
# Trigger: session cleanup, cache invalidation, reminder dispatch
Practical applications: session invalidation (trigger logout cleanup when session key expires), job timeout detection (set a key with the job TTL; expiry fires if the job never deletes it), and distributed lock monitoring.
Fan-Out Architecture
Redis Pub/Sub is the broadcast backbone for real-time fan-out. A single publisher can reach thousands of subscribers in under a millisecond. The pattern for a presence indicator in a chat application:
- On connect:
SET presence:{user_id} online EX 30+PUBLISH presence:channel "{user_id}:online" - Heartbeat:
EXPIRE presence:{user_id} 30every 15 seconds - On disconnect: key expires → keyspace notification fires →
PUBLISH presence:channel "{user_id}:offline" - All connected clients receive the publish and update the UI
In Redis Cluster mode, Pub/Sub messages are broadcast to all shards — the cluster itself handles routing, so your application code is identical in standalone and cluster deployments.
4. Lua Scripts for Atomic Operations
Every Redis MULTI/EXEC transaction has a fundamental limitation: the commands inside it are queued and sent as a batch, but the application must still make multiple round trips (WATCH, MULTI, commands, EXEC) and cannot branch based on intermediate values. If GET counter returns 5, you cannot conditionally SET counter 10 inside the same transaction without an optimistic lock retry loop.
Lua scripts run inside Redis's single-threaded execution model. From the moment EVAL fires, no other Redis command executes until the script completes. You get true atomicity, you can branch on intermediate values, and you eliminate round trips. The entire script is a single command from the client's perspective.
Rate Limiter: Sliding Window in Lua
# rate_limiter.py
import redis
import time
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
# Sliding window rate limiter
# Uses a sorted set where each member is a unique request ID
# and the score is the request timestamp in milliseconds.
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_ms = tonumber(ARGV[2])
local max_requests = tonumber(ARGV[3])
local request_id = ARGV[4]
-- Remove entries outside the sliding window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window_ms)
-- Count current requests in window
local count = redis.call('ZCARD', key)
if count < max_requests then
-- Allow: add this request to the window
redis.call('ZADD', key, now, request_id)
redis.call('PEXPIRE', key, window_ms)
return {1, max_requests - count - 1} -- {allowed, remaining}
else
-- Deny
return {0, 0}
end
"""
# Load script once, use SHA thereafter (saves bandwidth)
RATE_LIMIT_SHA = r.script_load(RATE_LIMIT_SCRIPT)
def check_rate_limit(
user_id: str,
max_requests: int = 100,
window_seconds: int = 60,
) -> tuple[bool, int]:
"""
Check if user_id is within rate limit.
Returns (allowed: bool, remaining: int).
"""
key = f"ratelimit:{user_id}"
now_ms = int(time.time() * 1000)
request_id = f"{now_ms}-{id(object())}" # unique per request
result = r.evalsha(
RATE_LIMIT_SHA,
1, # number of KEYS arguments
key, # KEYS[1]
now_ms, # ARGV[1]
window_seconds * 1000, # ARGV[2]: window in ms
max_requests, # ARGV[3]
request_id, # ARGV[4]
)
allowed = bool(result[0])
remaining = int(result[1])
return allowed, remaining
Distributed Lock with Expiry
The canonical Redis distributed lock pattern uses SET key token NX EX ttl. The token (a UUID) ensures only the lock holder can release it — another process cannot accidentally release a lock it doesn't hold. The Lua script makes the check-and-delete atomic:
import uuid
RELEASE_LOCK_SCRIPT = """
-- Only release if we hold the lock (token matches)
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end
"""
RELEASE_LOCK_SHA = r.script_load(RELEASE_LOCK_SCRIPT)
def acquire_lock(resource: str, ttl_seconds: int = 10) -> str | None:
"""Acquire lock. Returns token if acquired, None if not."""
token = str(uuid.uuid4())
acquired = r.set(f"lock:{resource}", token, nx=True, ex=ttl_seconds)
return token if acquired else None
def release_lock(resource: str, token: str) -> bool:
"""Release lock only if we hold it."""
result = r.evalsha(RELEASE_LOCK_SHA, 1, f"lock:{resource}", token)
return bool(result)
# Usage
token = acquire_lock("job:export:user:42", ttl_seconds=30)
if token:
try:
pass # do work
finally:
release_lock("job:export:user:42", token)

Conditional Leaderboard Update in Lua
# Only update a user's leaderboard score if the new score beats their current best
UPDATE_BEST_SCORE_SCRIPT = """
local key = KEYS[1]
local member = ARGV[1]
local new_score = tonumber(ARGV[2])
local current = redis.call('ZSCORE', key, member)
if current == false or new_score > tonumber(current) then
redis.call('ZADD', key, new_score, member)
return 1 -- updated
else
return 0 -- not updated (new score wasn't better)
end
"""
UPDATE_BEST_SCORE_SHA = r.script_load(UPDATE_BEST_SCORE_SCRIPT)
def update_best_score(leaderboard: str, user_id: str, score: float) -> bool:
result = r.evalsha(
UPDATE_BEST_SCORE_SHA, 1,
leaderboard,
user_id,
score,
)
return bool(result)
5. Redis as a Vector Database
Redis Stack ships with RediSearch, which since version 2.4 includes a production-grade vector search engine. It supports HNSW (Hierarchical Navigable Small World) and flat (brute-force) indexes, hybrid search combining vector similarity with metadata filters, and ANN (approximate nearest neighbor) search returning results in under a millisecond for millions of vectors on a single node.
This matters because teams running semantic search, recommendation engines, or RAG (retrieval-augmented generation) pipelines increasingly face the question: run a dedicated vector database (Pinecone, Weaviate, Qdrant), or use Redis Stack and keep the stack simple? For datasets under 10 million vectors where Redis is already in the stack, the answer is often Redis.
Setup: Creating a Vector Index
# vector_search.py
import redis
import numpy as np
from redis.commands.search.field import VectorField, TagField, TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
r = redis.Redis(host="localhost", port=6379, decode_responses=False)
VECTOR_DIM = 1536 # OpenAI text-embedding-3-small dimension
INDEX_NAME = "idx:docs"
DOC_PREFIX = "doc:"
def create_index():
try:
r.ft(INDEX_NAME).dropindex(delete_documents=False)
except Exception:
pass # Index didn't exist
schema = (
TextField("$.text", as_name="text"),
TagField("$.category", as_name="category"),
VectorField(
"$.embedding",
"HNSW", # HNSW for ANN (fast); FLAT for exact (small datasets)
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIM,
"DISTANCE_METRIC": "COSINE", # or L2, IP (inner product)
"INITIAL_CAP": 100_000, # preallocate for 100k vectors
"M": 16, # HNSW connectivity parameter
"EF_CONSTRUCTION": 200, # HNSW build-time quality
},
as_name="embedding",
),
)
r.ft(INDEX_NAME).create_index(
schema,
definition=IndexDefinition(
prefix=[DOC_PREFIX],
index_type=IndexType.JSON,
),
)
print(f"Index '{INDEX_NAME}' created")
def store_document(doc_id: str, text: str, category: str, embedding: np.ndarray):
"""Store a document with its embedding."""
import json
key = f"{DOC_PREFIX}{doc_id}"
r.json().set(key, "$", {
"text": text,
"category": category,
"embedding": embedding.astype(np.float32).tolist(),
})
def vector_search(
query_embedding: np.ndarray,
top_k: int = 5,
category_filter: str = None,
) -> list[dict]:
"""
KNN vector search with optional metadata filter.
Hybrid search: combine vector similarity with tag filter.
This is where Redis outperforms many dedicated vector DBs —
metadata filtering happens at the index level, not post-hoc.
"""
query_bytes = query_embedding.astype(np.float32).tobytes()
# Build filter expression
if category_filter:
# Hybrid: vector similarity AND metadata filter
filter_expr = f"(@category:{{{category_filter}}})"
else:
filter_expr = "*"
# KNN query syntax: @field_name:[VECTOR_RANGE radius $param]
# or KNN top_k: @field_name:[KNN k $param]
q = (
Query(f"{filter_expr}=>[KNN {top_k} @embedding $vec AS score]")
.sort_by("score")
.return_fields("text", "category", "score")
.paging(0, top_k)
.dialect(2)
)
results = r.ft(INDEX_NAME).search(q, query_params={"vec": query_bytes})
return [
{
"id": doc.id,
"text": doc.text,
"category": doc.category,
"score": float(doc.score), # cosine distance (lower = more similar)
}
for doc in results.docs
]
Performance Characteristics
On a single Redis node with 16 GB of RAM, HNSW handles 1 million 1536-dimension vectors in approximately 6 GB of memory and returns KNN results in under 2 milliseconds at the 99th percentile. Flat (brute-force) indexes are exact but O(n) — use flat for datasets under 50,000 vectors where perfect recall matters, HNSW for everything larger.
Redis Vector Search vs Dedicated Vector DBs
| Factor | Redis Stack | Pinecone | Weaviate | Qdrant |
|---|---|---|---|---|
| Setup complexity | Low (already in stack) | Zero (managed) | Medium | Medium |
| Max scale (practical) | ~10M vectors/node | Unlimited (managed) | Unlimited | Unlimited |
| Hybrid search | Yes | Yes | Yes | Yes |
| Persistence | RDB/AOF | Managed | Yes | Yes |
| Cost at 1M vectors | $0 (existing Redis) | ~$70/mo (s1.x1) | Self-host costs | Self-host costs |
| Operational overhead | Minimal | Zero | Moderate | Moderate |
Choose Redis for vectors when: you already run Redis Stack, dataset is under 10M vectors, and you want to avoid a separate service. Choose Pinecone when: you need fully managed, unlimited scale with zero ops. Choose Weaviate or Qdrant when: you need advanced filtering, multi-modal search, or open-source self-hosted control beyond what Redis offers.
6. Cluster Mode and High Availability
A standalone Redis node is a single point of failure. For production systems where Redis is on the critical path — and if you are using it as a message bus, session store, or real-time cache, it is — you need either Sentinel (automatic failover for standalone) or Redis Cluster (sharding + HA combined).
Hash Slots
Redis Cluster distributes keys across 16,384 hash slots. Each key maps to a slot via CRC16(key) % 16384. Slots are distributed across primary nodes — a three-node cluster gives approximately 5,461 slots per node. Reads from replicas are allowed with READONLY mode but are eventually consistent.
# Check which slot a key maps to
redis-cli CLUSTER KEYSLOT "user:123:session"
# → 8490 (example)
# Check which node owns that slot
redis-cli -c CLUSTER NODES | grep "8490"
Hash Tags for Co-location
Multi-key commands (MGET, MSET, Lua scripts referencing multiple keys) only work in Cluster mode if all keys hash to the same slot. Hash tags force co-location: only the portion of the key inside {} is used for slot calculation.
# Without hash tags — these keys may land on different nodes
# MGET user:123:profile user:123:session ← may fail in cluster mode
# With hash tags — both keys hash on "user:123"
# MGET {user:123}:profile {user:123}:session ← always same slot
user_id = 123
profile_key = f"{{user:{user_id}}}:profile"
session_key = f"{{user:{user_id}}}:session"
cart_key = f"{{user:{user_id}}}:cart"
# Now safe to use in pipelines and Lua scripts in cluster mode
pipe = r.pipeline(transaction=True)
pipe.get(profile_key)
pipe.get(session_key)
pipe.get(cart_key)
results = pipe.execute()
Sentinel vs Cluster
Sentinel provides automatic failover for a single primary + N replicas. It does not shard data. Use Sentinel when your dataset fits on one node and you want automatic failover without the complexity of Cluster. Three Sentinel processes (odd number for quorum) monitor the primary; if the primary is unreachable from quorum Sentinels, a failover is triggered and a replica is promoted. Failover takes 30–60 seconds by default (down-after-milliseconds + failover-timeout).
Cluster provides sharding across multiple primaries, each with optional replicas. Use Cluster when your dataset exceeds single-node memory, when you need horizontal write throughput, or when you want HA and sharding in a single deployment model.
Connection Pooling
Every application connecting to Redis should use a connection pool. Creating a new TCP connection per command adds 1–3 ms of overhead — significant when Redis commands themselves take under 0.1 ms.
# redis_pool.py
import redis
# Connection pool — create once at application startup
pool = redis.ConnectionPool(
host="localhost",
port=6379,
db=0,
max_connections=50, # tune based on worker count × commands-per-request
decode_responses=True,
socket_timeout=1.0, # command timeout
socket_connect_timeout=2.0,
)
# All clients share the pool
def get_redis() -> redis.Redis:
return redis.Redis(connection_pool=pool)
For Redis Cluster with ioredis in Node.js:
// cluster-client.js
import Redis from "ioredis";
const cluster = new Redis.Cluster(
[
{ host: "redis-node-1", port: 6379 },
{ host: "redis-node-2", port: 6379 },
{ host: "redis-node-3", port: 6379 },
],
{
redisOptions: {
password: process.env.REDIS_PASSWORD,
connectTimeout: 2000,
},
clusterRetryStrategy: (times) => Math.min(times * 100, 3000),
// Read from replicas for read-heavy workloads
scaleReads: "slave",
}
);
export default cluster;
Sentinel Failover Configuration
# redis-sentinel.conf (minimal production config)
sentinel monitor mymaster 10.0.1.10 6379 2 # quorum = 2
sentinel down-after-milliseconds mymaster 5000 # 5s to declare primary down
sentinel failover-timeout mymaster 60000 # 60s max for failover
sentinel parallel-syncs mymaster 1 # replicas to sync in parallel
With down-after-milliseconds 5000 and a typical failover completing in 15–20 seconds, expect a 20–30 second window of write unavailability during an unplanned primary failure. For applications that cannot tolerate this, use Cluster with min-replicas-to-write 1 to fail writes fast.
Conclusion
Redis earns its place on the critical path of production systems not because it is fast (it is), but because it provides the right primitives at each layer of the application stack. The patterns in this post cover the full range: advanced caching with XFetch eliminates stampedes without coordination overhead; Streams give you Kafka-level durability for the majority of real-world event volumes with none of the operational weight; Pub/Sub is the right tool for fire-and-forget fan-out where persistence would add latency with no benefit; Lua scripts make compound operations truly atomic without multi-round-trip transaction protocols; and Redis Stack's vector search removes the need for a separate vector store for datasets up to tens of millions of embeddings.
The pattern-selection heuristic is straightforward: if you need persistence and replay, use Streams. If you need broadcast with no durability requirement, use Pub/Sub. If you need atomic compound operations on multiple keys, use Lua. If you need sub-millisecond semantic search and you already run Redis Stack, use the vector index before reaching for Pinecone.
Operational considerations that matter more than any individual pattern: connection pooling (don't create connections per request), hash tags for co-location in Cluster mode (or multi-key commands will fail), and TTL hygiene (keys without TTLs will grow Redis memory indefinitely). Monitor redis-cli INFO memory for used_memory_rss versus maxmemory, and set maxmemory-policy allkeys-lru in cache-only deployments so Redis degrades gracefully under memory pressure rather than refusing writes.
The full code in this post is production-ready. Drop the XFetch implementation into any cache layer, the consumer group worker into any event-driven service, and the Lua rate limiter into any API gateway. The primitives are stable across Redis 7.x.
Sources
- Redis Streams documentation
- RediSearch vector search
- XFetch algorithm — Vattani, Chierichetti, Lowenstein (2015)
- Redis Cluster specification
- ioredis — Node.js Redis client
- redis-py — Python Redis client
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment