Redis Advanced Patterns in 2026: Streams, Pub/Sub, Lua Scripts, and Vector Search

Redis Advanced Patterns in 2026: Streams, Pub/Sub, Lua Scripts, and Vector Search

Hero image

Introduction

Redis ships with a reputation it only partially deserves. Most teams use it as a dumb key-value store — SET key value EX 300, GET key, done. That pattern is useful, but it barely scratches what Redis can do. The same process that holds your session tokens also gives you a persistent, consumer-group-aware message log, a pub/sub broadcast bus, an atomic scripting engine, and — with Redis Stack — a vector database capable of sub-millisecond approximate nearest neighbor search across millions of embeddings.

The version most production systems are running today — Redis 7.x and Redis Stack 2.x — is a fundamentally different beast than the Redis from five years ago. Streams, introduced in Redis 5.0, are now mature enough to replace Kafka for the majority of workloads that don't need multi-day retention or petabyte-scale throughput. RediSearch's vector search, available in Redis Stack, has gone from an experiment to a serious alternative to Pinecone and Weaviate for teams that want to avoid managing a separate vector store. Lua scripting has always been there, but most developers still reach for MULTI/EXEC pipelines when a well-written Lua script would eliminate race conditions entirely.

This post covers the patterns that graduate Redis from "fast cache" to "production workhorse": advanced caching with stampede prevention, Streams for durable event processing with consumer groups and dead-letter queues, Pub/Sub for real-time fan-out, Lua scripts for atomic compound operations, vector search for semantic similarity workloads, and Cluster mode for high availability at scale. Every code example is complete and production-ready. Where Redis competes with specialized tools — Kafka, RabbitMQ, Pinecone — the tradeoffs are explicit.


1. Advanced Caching Patterns

The SET key value EX ttl pattern handles maybe 60 percent of caching use cases. The remaining 40 percent — write-through, write-behind, stampede prevention, layered caches, per-datatype TTL strategies — is where caching actually gets interesting and where naive implementations silently degrade under load.

Cache Strategies

Cache-aside (lazy loading) is the most common pattern: the application checks the cache on read, populates it on miss. Simple to reason about, easy to implement, but it means the first request after a cache miss hits the database. Under traffic spikes, dozens or hundreds of requests can all miss simultaneously on the same key and all hit the database in parallel — this is a cache stampede.

Write-through updates the cache synchronously on every write. Cache and database are always in sync. The cost: every write pays the penalty of two writes (cache + DB), and you cache data that may never be read.

Write-behind (write-back) writes to the cache first and flushes to the database asynchronously. Dramatically reduces write latency, but risks data loss if Redis restarts before the flush. Appropriate for metrics accumulation or click counters; inappropriate for financial records.

XFetch: Probabilistic Early Expiration

Cache stampede is a deceptively hard problem. The naive fix — a distributed lock that forces only one caller to recompute while others wait — adds latency and creates its own contention. The XFetch algorithm from Vattani, Chierichetti, and Lowenstein (2015) solves this without locks: it probabilistically recomputes the cache early, before expiration, based on how expensive the recomputation is and how close the TTL is.

The probability of early recomputation grows as the key approaches expiration. Expensive recomputations (high beta) trigger early refresh sooner. The result: the cache is refreshed in the background before it expires, and stampedes never happen.

# cache_xfetch.py
import redis
import time
import math
import random
import json
from typing import Callable, Any, Optional

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def xfetch(
    key: str,
    ttl: int,
    recompute: Callable[[], Any],
    beta: float = 1.0,
) -> Any:
    """
    XFetch: probabilistic early expiration to prevent cache stampedes.

    Args:
        key:        Redis key to cache under
        ttl:        Desired TTL in seconds
        recompute:  Callable that fetches the canonical value (DB query, API call, etc.)
        beta:       Recomputation cost factor. Higher = recompute sooner.
                    1.0 is a sensible default. Set higher for expensive recomputations.

    Returns:
        Cached or freshly computed value.
    """
    # Fetch current cached value and its remaining TTL atomically
    pipe = r.pipeline(transaction=False)
    pipe.get(key)
    pipe.ttl(key)
    cached_raw, remaining_ttl = pipe.execute()

    if cached_raw is not None:
        cached = json.loads(cached_raw)
        expiry = time.time() + remaining_ttl  # absolute expiry time

        # XFetch early-expiration check:
        # Recompute early with probability proportional to how close we are to expiry
        # and how expensive the recomputation is (delta * beta).
        delta = ttl - remaining_ttl  # time elapsed since last refresh
        if delta <= 0:
            delta = 1  # guard against zero/negative delta on fresh keys

        # The longer a recomputation takes (larger delta) and the closer
        # the key is to expiry, the more likely we are to refresh now.
        jitter = -beta * delta * math.log(random.random())
        if time.time() + jitter >= expiry:
            # Probabilistically decided to refresh early — recompute and cache
            value = recompute()
            r.set(key, json.dumps(value), ex=ttl)
            return value

        return cached

    # Cache miss: recompute and cache
    value = recompute()
    r.set(key, json.dumps(value), ex=ttl)
    return value


# --- Layered cache: L1 (local dict) → L2 (Redis) → L3 (database) ---

import functools
from collections import OrderedDict

class LRUCache:
    """Minimal in-process LRU cache for L1 layer."""
    def __init__(self, maxsize: int = 512):
        self.cache: OrderedDict = OrderedDict()
        self.maxsize = maxsize

    def get(self, key: str) -> Optional[Any]:
        if key in self.cache:
            self.cache.move_to_end(key)
            return self.cache[key]
        return None

    def set(self, key: str, value: Any) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.maxsize:
            self.cache.popitem(last=False)

l1 = LRUCache(maxsize=512)

def get_with_layered_cache(
    key: str,
    db_fetch: Callable[[], Any],
    l2_ttl: int = 300,
) -> Any:
    """
    Three-layer cache lookup:
      L1 → in-process LRU dict (microseconds)
      L2 → Redis (sub-millisecond)
      L3 → database (milliseconds to seconds)
    """
    # L1 check
    value = l1.get(key)
    if value is not None:
        return value

    # L2 check (Redis), with stampede protection
    value = xfetch(key, ttl=l2_ttl, recompute=db_fetch)

    # Populate L1
    l1.set(key, value)
    return value

TTL Strategies by Data Volatility

TTL is not one-size-fits-all. A sensible tiering:

Data type TTL Rationale
User session 30 minutes (sliding) Active sessions stay warm; idle sessions expire
Product catalog 10 minutes Changes rarely; stale for a few minutes is acceptable
User profile 5 minutes Changes infrequently; short TTL keeps data fresh
Real-time prices 5 seconds Data staleness is a business risk
Feature flags 60 seconds Need to propagate quickly after changes
Computed aggregates 30 minutes Expensive to recompute; tolerate some staleness
Architecture diagram

flowchart LR A[Request] --> B{L1 Cache\nlocal dict} B -- Hit --> Z[Return value] B -- Miss --> C{L2 Cache\nRedis} C -- Hit --> D[Populate L1] D --> Z C -- Miss --> E[L3: Database] E --> F[Populate L2\nXFetch TTL] F --> D


2. Redis Streams for Event Processing

Redis Streams, stable since Redis 5.0 and hardened through 7.x, is a persistent, append-only log with consumer group semantics. It is not a replacement for Kafka at petabyte scale or with multi-day retention requirements. It is a serious replacement for Kafka in the 95 percent of systems where throughput stays under 1 million events per day, retention is hours-to-days rather than weeks-to-months, and the operational cost of running a Kafka cluster (Zookeeper or KRaft, broker replication, topic management, consumer group lag monitoring) is not justified.

A single Redis node benchmarks at over 1 million XADD operations per second. With consumer groups, you get competing consumers, at-least-once delivery semantics, and a built-in pending entries list (PEL) that tracks unacknowledged messages. This is Kafka-lite without the JVM.

Core Commands

  • XADD stream * field value [field value ...] — append entry, auto-generate ID
  • XREAD COUNT n STREAMS stream 0 — read from beginning
  • XREADGROUP GROUP grp consumer COUNT n STREAMS stream > — read undelivered messages to this group
  • XACK stream grp id — acknowledge processing complete
  • XPENDING stream grp - + n — list unacknowledged messages
  • XCLAIM stream grp consumer min-idle-ms id — reassign a stale pending message

Full Producer + Consumer Group with DLQ

// streams-consumer.js — Node.js (ioredis)
import Redis from "ioredis";

const redis = new Redis({ host: "localhost", port: 6379 });
const STREAM = "events:orders";
const GROUP = "order-processor";
const DLQ_STREAM = "events:orders:dlq";
const MAX_RETRIES = 3;
const CLAIM_IDLE_MS = 30_000; // reclaim messages idle > 30s

// --- Setup ---
async function ensureConsumerGroup() {
  try {
    // MKSTREAM creates the stream if it doesn't exist
    await redis.xgroup("CREATE", STREAM, GROUP, "$", "MKSTREAM");
    console.log(`Consumer group '${GROUP}' created`);
  } catch (err) {
    if (!err.message.includes("BUSYGROUP")) throw err;
    // Group already exists — fine
  }
}

// --- Producer ---
async function produce(order) {
  const id = await redis.xadd(
    STREAM,
    "*",                      // auto-generate ID (timestamp-based)
    "order_id", order.id,
    "customer", order.customer,
    "amount", String(order.amount),
    "payload", JSON.stringify(order),
  );
  console.log(`Produced message ${id}`);
  return id;
}

// --- Process one message ---
async function processMessage(id, fields) {
  // fields comes back as flat array: [key, val, key, val, ...]
  const data = {};
  for (let i = 0; i < fields.length; i += 2) {
    data[fields[i]] = fields[i + 1];
  }

  console.log(`Processing order ${data.order_id} for ${data.customer}`);

  // Simulate processing (replace with real business logic)
  if (Math.random() < 0.1) {
    throw new Error(`Simulated failure for order ${data.order_id}`);
  }

  console.log(`Order ${data.order_id} processed successfully`);
}

// --- Move to DLQ ---
async function sendToDLQ(id, fields, reason) {
  await redis.xadd(
    DLQ_STREAM,
    "*",
    "original_id", id,
    "reason", reason,
    "failed_at", String(Date.now()),
    ...fields,
  );
  console.warn(`Message ${id} sent to DLQ: ${reason}`);
}

// --- Consumer loop ---
async function runConsumer(consumerName) {
  await ensureConsumerGroup();

  console.log(`Consumer '${consumerName}' starting`);

  while (true) {
    // 1. Check for stale pending messages (unacked for > CLAIM_IDLE_MS)
    const pending = await redis.xpending(
      STREAM, GROUP, "-", "+", 10
    );

    for (const entry of pending) {
      const [msgId, owner, idleMs, deliveryCount] = entry;

      if (idleMs > CLAIM_IDLE_MS) {
        if (deliveryCount >= MAX_RETRIES) {
          // Exceeded retry limit → DLQ
          const claimed = await redis.xclaim(
            STREAM, GROUP, consumerName, CLAIM_IDLE_MS, msgId
          );
          if (claimed.length > 0) {
            const [claimedId, claimedFields] = claimed[0];
            await sendToDLQ(claimedId, claimedFields, `Max retries (${MAX_RETRIES}) exceeded`);
            await redis.xack(STREAM, GROUP, claimedId);
          }
        } else {
          // Reclaim and retry
          await redis.xclaim(STREAM, GROUP, consumerName, CLAIM_IDLE_MS, msgId);
          console.log(`Reclaimed stale message ${msgId} (attempt ${deliveryCount + 1})`);
        }
      }
    }

    // 2. Read new messages (> means "only undelivered to this group")
    const results = await redis.xreadgroup(
      "GROUP", GROUP,
      consumerName,
      "COUNT", "10",
      "BLOCK", "2000",   // block up to 2 seconds waiting for new messages
      "STREAMS", STREAM,
      ">",
    );

    if (!results) continue; // timeout with no messages — loop back

    for (const [_stream, messages] of results) {
      for (const [id, fields] of messages) {
        try {
          await processMessage(id, fields);
          await redis.xack(STREAM, GROUP, id); // ack only on success
        } catch (err) {
          // Don't ack — leave in PEL for retry via XCLAIM loop above
          console.error(`Failed to process ${id}: ${err.message}`);
        }
      }
    }
  }
}

// --- Entry point ---
const consumerName = process.argv[2] || "consumer-1";
runConsumer(consumerName).catch(console.error);

flowchart TD P[Producer] -->|XADD| S[(Redis Stream)] S -->|XREADGROUP| CG[Consumer Group] CG --> W1[Worker 1] CG --> W2[Worker 2] CG --> W3[Worker 3] W1 -->|Success: XACK| S W2 -->|Failure: stays in PEL| PE[Pending Entries List] PE -->|idle > 30s: XCLAIM| W3 W3 -->|retries > 3: XADD| DLQ[(Dead Letter Queue)] DLQ --> MON[DLQ Monitor / Alerting]

Redis Streams vs Kafka: When to Choose Which

Factor Redis Streams Kafka
Throughput 1M+ msg/s single node 10M+ msg/s multi-broker
Retention Hours to days (memory-backed) Weeks to months (disk)
Operational cost Zero — already running Redis Significant (brokers, ZK/KRaft)
Consumer groups Yes Yes
Replay Yes (XRANGE from any ID) Yes
Schema registry No Confluent Schema Registry
Exactly-once No (at-least-once) Yes (with transactions)

Choose Redis Streams when: throughput is under 1M events/day, retention under 48 hours, you already run Redis, and exactly-once semantics are not required. Choose Kafka when: throughput exceeds what a single Redis node handles, you need multi-week retention, or exactly-once delivery is a hard requirement.


3. Pub/Sub and Real-Time Patterns

Redis Pub/Sub and Redis Streams are complements, not alternatives. The key distinction: Pub/Sub is fire-and-forget — messages published to a channel are delivered only to subscribers active at that moment and are not persisted. Streams are persistent logs. A subscriber that goes offline for five seconds during a Streams workload misses nothing; a subscriber that goes offline for five seconds during a Pub/Sub workload misses everything published in that window.

Use Pub/Sub for: live notifications, presence indicators, real-time dashboards, and any pattern where a momentary gap is acceptable. Use Streams for: anything requiring guaranteed delivery or replay.

SUBSCRIBE, PUBLISH, PSUBSCRIBE

# pubsub_demo.py
import redis
import threading
import time

r_pub = redis.Redis(host="localhost", port=6379, decode_responses=True)
r_sub = redis.Redis(host="localhost", port=6379, decode_responses=True)

def subscriber():
    pubsub = r_sub.pubsub()

    # Subscribe to exact channel
    pubsub.subscribe("notifications:global")

    # Pattern subscription — catches notifications:user:*, notifications:team:*, etc.
    pubsub.psubscribe("notifications:*")

    for message in pubsub.listen():
        if message["type"] in ("message", "pmessage"):
            print(f"[{message['channel']}] {message['data']}")

def publisher():
    time.sleep(0.5)  # Let subscriber connect
    r_pub.publish("notifications:global", "System maintenance at 22:00 UTC")
    r_pub.publish("notifications:user:42", "Your export is ready")
    r_pub.publish("notifications:team:engineering", "Deploy window open")

t = threading.Thread(target=subscriber, daemon=True)
t.start()
publisher()
time.sleep(1)

Keyspace Notifications

Keyspace notifications let you subscribe to Redis key lifecycle events — expiry, deletion, set operations — without polling. Enable them in redis.conf or at runtime:

# Enable expired + generic key events
redis-cli CONFIG SET notify-keyspace-events "Ex"
# Watch for key expiry events
pubsub = r_sub.pubsub()
pubsub.psubscribe("__keyevent@0__:expired")

for message in pubsub.listen():
    if message["type"] == "pmessage":
        expired_key = message["data"]
        print(f"Key expired: {expired_key}")
        # Trigger: session cleanup, cache invalidation, reminder dispatch

Practical applications: session invalidation (trigger logout cleanup when session key expires), job timeout detection (set a key with the job TTL; expiry fires if the job never deletes it), and distributed lock monitoring.

Fan-Out Architecture

Redis Pub/Sub is the broadcast backbone for real-time fan-out. A single publisher can reach thousands of subscribers in under a millisecond. The pattern for a presence indicator in a chat application:

  1. On connect: SET presence:{user_id} online EX 30 + PUBLISH presence:channel "{user_id}:online"
  2. Heartbeat: EXPIRE presence:{user_id} 30 every 15 seconds
  3. On disconnect: key expires → keyspace notification fires → PUBLISH presence:channel "{user_id}:offline"
  4. All connected clients receive the publish and update the UI

In Redis Cluster mode, Pub/Sub messages are broadcast to all shards — the cluster itself handles routing, so your application code is identical in standalone and cluster deployments.


4. Lua Scripts for Atomic Operations

Every Redis MULTI/EXEC transaction has a fundamental limitation: the commands inside it are queued and sent as a batch, but the application must still make multiple round trips (WATCH, MULTI, commands, EXEC) and cannot branch based on intermediate values. If GET counter returns 5, you cannot conditionally SET counter 10 inside the same transaction without an optimistic lock retry loop.

Lua scripts run inside Redis's single-threaded execution model. From the moment EVAL fires, no other Redis command executes until the script completes. You get true atomicity, you can branch on intermediate values, and you eliminate round trips. The entire script is a single command from the client's perspective.

Rate Limiter: Sliding Window in Lua

# rate_limiter.py
import redis
import time

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

# Sliding window rate limiter
# Uses a sorted set where each member is a unique request ID
# and the score is the request timestamp in milliseconds.
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_ms = tonumber(ARGV[2])
local max_requests = tonumber(ARGV[3])
local request_id = ARGV[4]

-- Remove entries outside the sliding window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window_ms)

-- Count current requests in window
local count = redis.call('ZCARD', key)

if count < max_requests then
    -- Allow: add this request to the window
    redis.call('ZADD', key, now, request_id)
    redis.call('PEXPIRE', key, window_ms)
    return {1, max_requests - count - 1}  -- {allowed, remaining}
else
    -- Deny
    return {0, 0}
end
"""

# Load script once, use SHA thereafter (saves bandwidth)
RATE_LIMIT_SHA = r.script_load(RATE_LIMIT_SCRIPT)

def check_rate_limit(
    user_id: str,
    max_requests: int = 100,
    window_seconds: int = 60,
) -> tuple[bool, int]:
    """
    Check if user_id is within rate limit.
    Returns (allowed: bool, remaining: int).
    """
    key = f"ratelimit:{user_id}"
    now_ms = int(time.time() * 1000)
    request_id = f"{now_ms}-{id(object())}"  # unique per request

    result = r.evalsha(
        RATE_LIMIT_SHA,
        1,             # number of KEYS arguments
        key,           # KEYS[1]
        now_ms,        # ARGV[1]
        window_seconds * 1000,  # ARGV[2]: window in ms
        max_requests,  # ARGV[3]
        request_id,    # ARGV[4]
    )

    allowed = bool(result[0])
    remaining = int(result[1])
    return allowed, remaining

Distributed Lock with Expiry

The canonical Redis distributed lock pattern uses SET key token NX EX ttl. The token (a UUID) ensures only the lock holder can release it — another process cannot accidentally release a lock it doesn't hold. The Lua script makes the check-and-delete atomic:

import uuid

RELEASE_LOCK_SCRIPT = """
-- Only release if we hold the lock (token matches)
if redis.call('GET', KEYS[1]) == ARGV[1] then
    return redis.call('DEL', KEYS[1])
else
    return 0
end
"""
RELEASE_LOCK_SHA = r.script_load(RELEASE_LOCK_SCRIPT)

def acquire_lock(resource: str, ttl_seconds: int = 10) -> str | None:
    """Acquire lock. Returns token if acquired, None if not."""
    token = str(uuid.uuid4())
    acquired = r.set(f"lock:{resource}", token, nx=True, ex=ttl_seconds)
    return token if acquired else None

def release_lock(resource: str, token: str) -> bool:
    """Release lock only if we hold it."""
    result = r.evalsha(RELEASE_LOCK_SHA, 1, f"lock:{resource}", token)
    return bool(result)

# Usage
token = acquire_lock("job:export:user:42", ttl_seconds=30)
if token:
    try:
        pass  # do work
    finally:
        release_lock("job:export:user:42", token)
Comparison visual

sequenceDiagram participant C1 as Client 1 participant C2 as Client 2 participant R as Redis rect rgb(255, 235, 235) Note over C1,R: Race condition — MULTI/EXEC with WATCH C1->>R: WATCH counter C2->>R: WATCH counter C1->>R: GET counter → 5 C2->>R: GET counter → 5 C1->>R: MULTI / INCR counter / EXEC → OK (counter=6) C2->>R: MULTI / INCR counter / EXEC → nil (conflict, retry needed) end rect rgb(235, 255, 235) Note over C1,R: Lua atomic — no conflict possible C1->>R: EVAL "if GET counter >= limit then return 0 end INCR counter return 1" Note over R: Executes atomically; C2 blocked until complete R-->>C1: 1 (allowed) C2->>R: EVAL same script R-->>C2: 0 (limit reached) or 1 (incremented) end

Conditional Leaderboard Update in Lua

# Only update a user's leaderboard score if the new score beats their current best
UPDATE_BEST_SCORE_SCRIPT = """
local key = KEYS[1]
local member = ARGV[1]
local new_score = tonumber(ARGV[2])

local current = redis.call('ZSCORE', key, member)

if current == false or new_score > tonumber(current) then
    redis.call('ZADD', key, new_score, member)
    return 1  -- updated
else
    return 0  -- not updated (new score wasn't better)
end
"""
UPDATE_BEST_SCORE_SHA = r.script_load(UPDATE_BEST_SCORE_SCRIPT)

def update_best_score(leaderboard: str, user_id: str, score: float) -> bool:
    result = r.evalsha(
        UPDATE_BEST_SCORE_SHA, 1,
        leaderboard,
        user_id,
        score,
    )
    return bool(result)

5. Redis as a Vector Database

Redis Stack ships with RediSearch, which since version 2.4 includes a production-grade vector search engine. It supports HNSW (Hierarchical Navigable Small World) and flat (brute-force) indexes, hybrid search combining vector similarity with metadata filters, and ANN (approximate nearest neighbor) search returning results in under a millisecond for millions of vectors on a single node.

This matters because teams running semantic search, recommendation engines, or RAG (retrieval-augmented generation) pipelines increasingly face the question: run a dedicated vector database (Pinecone, Weaviate, Qdrant), or use Redis Stack and keep the stack simple? For datasets under 10 million vectors where Redis is already in the stack, the answer is often Redis.

Setup: Creating a Vector Index

# vector_search.py
import redis
import numpy as np
from redis.commands.search.field import VectorField, TagField, TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

r = redis.Redis(host="localhost", port=6379, decode_responses=False)

VECTOR_DIM = 1536       # OpenAI text-embedding-3-small dimension
INDEX_NAME = "idx:docs"
DOC_PREFIX = "doc:"

def create_index():
    try:
        r.ft(INDEX_NAME).dropindex(delete_documents=False)
    except Exception:
        pass  # Index didn't exist

    schema = (
        TextField("$.text", as_name="text"),
        TagField("$.category", as_name="category"),
        VectorField(
            "$.embedding",
            "HNSW",           # HNSW for ANN (fast); FLAT for exact (small datasets)
            {
                "TYPE": "FLOAT32",
                "DIM": VECTOR_DIM,
                "DISTANCE_METRIC": "COSINE",   # or L2, IP (inner product)
                "INITIAL_CAP": 100_000,        # preallocate for 100k vectors
                "M": 16,                       # HNSW connectivity parameter
                "EF_CONSTRUCTION": 200,        # HNSW build-time quality
            },
            as_name="embedding",
        ),
    )

    r.ft(INDEX_NAME).create_index(
        schema,
        definition=IndexDefinition(
            prefix=[DOC_PREFIX],
            index_type=IndexType.JSON,
        ),
    )
    print(f"Index '{INDEX_NAME}' created")


def store_document(doc_id: str, text: str, category: str, embedding: np.ndarray):
    """Store a document with its embedding."""
    import json
    key = f"{DOC_PREFIX}{doc_id}"
    r.json().set(key, "$", {
        "text": text,
        "category": category,
        "embedding": embedding.astype(np.float32).tolist(),
    })


def vector_search(
    query_embedding: np.ndarray,
    top_k: int = 5,
    category_filter: str = None,
) -> list[dict]:
    """
    KNN vector search with optional metadata filter.

    Hybrid search: combine vector similarity with tag filter.
    This is where Redis outperforms many dedicated vector DBs —
    metadata filtering happens at the index level, not post-hoc.
    """
    query_bytes = query_embedding.astype(np.float32).tobytes()

    # Build filter expression
    if category_filter:
        # Hybrid: vector similarity AND metadata filter
        filter_expr = f"(@category:{{{category_filter}}})"
    else:
        filter_expr = "*"

    # KNN query syntax: @field_name:[VECTOR_RANGE radius $param]
    # or KNN top_k: @field_name:[KNN k $param]
    q = (
        Query(f"{filter_expr}=>[KNN {top_k} @embedding $vec AS score]")
        .sort_by("score")
        .return_fields("text", "category", "score")
        .paging(0, top_k)
        .dialect(2)
    )

    results = r.ft(INDEX_NAME).search(q, query_params={"vec": query_bytes})

    return [
        {
            "id": doc.id,
            "text": doc.text,
            "category": doc.category,
            "score": float(doc.score),   # cosine distance (lower = more similar)
        }
        for doc in results.docs
    ]

Performance Characteristics

On a single Redis node with 16 GB of RAM, HNSW handles 1 million 1536-dimension vectors in approximately 6 GB of memory and returns KNN results in under 2 milliseconds at the 99th percentile. Flat (brute-force) indexes are exact but O(n) — use flat for datasets under 50,000 vectors where perfect recall matters, HNSW for everything larger.

Redis Vector Search vs Dedicated Vector DBs

Factor Redis Stack Pinecone Weaviate Qdrant
Setup complexity Low (already in stack) Zero (managed) Medium Medium
Max scale (practical) ~10M vectors/node Unlimited (managed) Unlimited Unlimited
Hybrid search Yes Yes Yes Yes
Persistence RDB/AOF Managed Yes Yes
Cost at 1M vectors $0 (existing Redis) ~$70/mo (s1.x1) Self-host costs Self-host costs
Operational overhead Minimal Zero Moderate Moderate

Choose Redis for vectors when: you already run Redis Stack, dataset is under 10M vectors, and you want to avoid a separate service. Choose Pinecone when: you need fully managed, unlimited scale with zero ops. Choose Weaviate or Qdrant when: you need advanced filtering, multi-modal search, or open-source self-hosted control beyond what Redis offers.


6. Cluster Mode and High Availability

A standalone Redis node is a single point of failure. For production systems where Redis is on the critical path — and if you are using it as a message bus, session store, or real-time cache, it is — you need either Sentinel (automatic failover for standalone) or Redis Cluster (sharding + HA combined).

Hash Slots

Redis Cluster distributes keys across 16,384 hash slots. Each key maps to a slot via CRC16(key) % 16384. Slots are distributed across primary nodes — a three-node cluster gives approximately 5,461 slots per node. Reads from replicas are allowed with READONLY mode but are eventually consistent.

# Check which slot a key maps to
redis-cli CLUSTER KEYSLOT "user:123:session"
# → 8490 (example)

# Check which node owns that slot
redis-cli -c CLUSTER NODES | grep "8490"

Hash Tags for Co-location

Multi-key commands (MGET, MSET, Lua scripts referencing multiple keys) only work in Cluster mode if all keys hash to the same slot. Hash tags force co-location: only the portion of the key inside {} is used for slot calculation.

# Without hash tags — these keys may land on different nodes
# MGET user:123:profile user:123:session  ← may fail in cluster mode

# With hash tags — both keys hash on "user:123"
# MGET {user:123}:profile {user:123}:session  ← always same slot

user_id = 123
profile_key = f"{{user:{user_id}}}:profile"
session_key = f"{{user:{user_id}}}:session"
cart_key    = f"{{user:{user_id}}}:cart"

# Now safe to use in pipelines and Lua scripts in cluster mode
pipe = r.pipeline(transaction=True)
pipe.get(profile_key)
pipe.get(session_key)
pipe.get(cart_key)
results = pipe.execute()

Sentinel vs Cluster

Sentinel provides automatic failover for a single primary + N replicas. It does not shard data. Use Sentinel when your dataset fits on one node and you want automatic failover without the complexity of Cluster. Three Sentinel processes (odd number for quorum) monitor the primary; if the primary is unreachable from quorum Sentinels, a failover is triggered and a replica is promoted. Failover takes 30–60 seconds by default (down-after-milliseconds + failover-timeout).

Cluster provides sharding across multiple primaries, each with optional replicas. Use Cluster when your dataset exceeds single-node memory, when you need horizontal write throughput, or when you want HA and sharding in a single deployment model.

Connection Pooling

Every application connecting to Redis should use a connection pool. Creating a new TCP connection per command adds 1–3 ms of overhead — significant when Redis commands themselves take under 0.1 ms.

# redis_pool.py
import redis

# Connection pool — create once at application startup
pool = redis.ConnectionPool(
    host="localhost",
    port=6379,
    db=0,
    max_connections=50,        # tune based on worker count × commands-per-request
    decode_responses=True,
    socket_timeout=1.0,        # command timeout
    socket_connect_timeout=2.0,
)

# All clients share the pool
def get_redis() -> redis.Redis:
    return redis.Redis(connection_pool=pool)

For Redis Cluster with ioredis in Node.js:

// cluster-client.js
import Redis from "ioredis";

const cluster = new Redis.Cluster(
  [
    { host: "redis-node-1", port: 6379 },
    { host: "redis-node-2", port: 6379 },
    { host: "redis-node-3", port: 6379 },
  ],
  {
    redisOptions: {
      password: process.env.REDIS_PASSWORD,
      connectTimeout: 2000,
    },
    clusterRetryStrategy: (times) => Math.min(times * 100, 3000),
    // Read from replicas for read-heavy workloads
    scaleReads: "slave",
  }
);

export default cluster;

Sentinel Failover Configuration

# redis-sentinel.conf (minimal production config)
sentinel monitor mymaster 10.0.1.10 6379 2      # quorum = 2
sentinel down-after-milliseconds mymaster 5000   # 5s to declare primary down
sentinel failover-timeout mymaster 60000         # 60s max for failover
sentinel parallel-syncs mymaster 1               # replicas to sync in parallel

With down-after-milliseconds 5000 and a typical failover completing in 15–20 seconds, expect a 20–30 second window of write unavailability during an unplanned primary failure. For applications that cannot tolerate this, use Cluster with min-replicas-to-write 1 to fail writes fast.


Conclusion

Redis earns its place on the critical path of production systems not because it is fast (it is), but because it provides the right primitives at each layer of the application stack. The patterns in this post cover the full range: advanced caching with XFetch eliminates stampedes without coordination overhead; Streams give you Kafka-level durability for the majority of real-world event volumes with none of the operational weight; Pub/Sub is the right tool for fire-and-forget fan-out where persistence would add latency with no benefit; Lua scripts make compound operations truly atomic without multi-round-trip transaction protocols; and Redis Stack's vector search removes the need for a separate vector store for datasets up to tens of millions of embeddings.

The pattern-selection heuristic is straightforward: if you need persistence and replay, use Streams. If you need broadcast with no durability requirement, use Pub/Sub. If you need atomic compound operations on multiple keys, use Lua. If you need sub-millisecond semantic search and you already run Redis Stack, use the vector index before reaching for Pinecone.

Operational considerations that matter more than any individual pattern: connection pooling (don't create connections per request), hash tags for co-location in Cluster mode (or multi-key commands will fail), and TTL hygiene (keys without TTLs will grow Redis memory indefinitely). Monitor redis-cli INFO memory for used_memory_rss versus maxmemory, and set maxmemory-policy allkeys-lru in cache-only deployments so Redis degrades gracefully under memory pressure rather than refusing writes.

The full code in this post is production-ready. Drop the XFetch implementation into any cache layer, the consumer group worker into any event-driven service, and the Lua rate limiter into any API gateway. The primitives are stable across Redis 7.x.


Sources


Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.

Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

What Is Voice AI? TTS, STT, and Voice Agents Explained