AmtocSoft Tech Insights: WebSockets and Real-Time Architecture in 2026: SSE, WebRTC, and Scaling Stateful Connections

Friday, April 17, 2026

WebSockets and Real-Time Architecture in 2026: SSE, WebRTC, and Scaling Stateful Connections

Introduction

The web was designed for request-response. A client asks, a server answers, the connection closes. That model works for loading pages, submitting forms, and fetching data on demand. It falls apart the moment you need the server to push something to the client without being asked — a new chat message arriving, a collaborator's cursor moving, a stock price updating, a live game state changing.

In 2026, real-time is no longer a niche feature. Chat is table stakes. Collaborative editing is expected — Google Docs set that bar a decade ago and every modern SaaS has internalized it. Live dashboards are standard in observability tools, trading platforms, and operational software. Multiplayer experiences, from document editors to CAD tools to coding environments, have moved from differentiators to requirements. Presence indicators — knowing who else is in the document, who is typing, who is online — are woven into every serious collaborative product.

The technical challenge is that none of this fits HTTP's request-response model natively. Three transport protocols have emerged to solve it, each with different tradeoffs: WebSockets, Server-Sent Events (SSE), and WebRTC. Choosing the wrong one creates architectural debt that is painful to unwind. Using WebSockets everywhere is as much a mistake as never using them.

WebSockets give you a full-duplex persistent connection — both sides can send at any time. SSE gives you a one-way stream from server to client over plain HTTP, with built-in reconnection and event replay. WebRTC gives you peer-to-peer connections for media and data, bypassing your servers entirely for the data path. Each occupies a different position in the design space.

The practical decision comes down to directionality, frequency, latency requirements, and infrastructure complexity. A live notification feed doesn't need WebSockets — SSE is simpler, more reliable, and scales better. A video call doesn't belong on WebSockets — WebRTC is the right tool. A multiplayer game or collaborative editor genuinely needs WebSockets or a higher-level abstraction like CRDTs on top of them.

This post covers each transport in depth, with complete working code, and then addresses the hardest production problem: scaling stateful connections horizontally across multiple server instances.

1. WebSockets: Full-Duplex Persistent Connections

WebSockets are the most versatile of the three transport options, and consequently the most overused. Understanding the protocol mechanics first makes it easier to know when to reach for it and when to leave it on the shelf.

The Handshake: HTTP Upgrade

A WebSocket connection starts as a plain HTTP request. The client sends an Upgrade header signaling that it wants to switch protocols:

GET /chat HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

If the server agrees, it responds with 101 Switching Protocols. From that point on, the TCP connection is handed over to the WebSocket protocol — both sides can send frames independently at any time. There is no polling, no long-hanging request, no re-establishing a connection for each message.

The Sec-WebSocket-Key mechanism is a base64-encoded 16-byte nonce. The server concatenates it with a fixed GUID, SHA-1 hashes the result, and base64-encodes it back. This prevents a misconfigured HTTP cache from treating WebSocket frames as HTTP responses. It is not security — it is a protocol handshake validity check.

Frame Types

The WebSocket frame format is lean. Each frame has a 2-byte minimum header containing:
- FIN bit: whether this is the final frame in a message (messages can be fragmented)
- Opcode: what kind of frame this is
- Masking bit: client→server frames must be masked (server→client must not be)
- Payload length

The opcodes you care about in practice:
- 0x1 — text frame (UTF-8 payload)
- 0x2 — binary frame (arbitrary bytes)
- 0x8 — close frame (with optional status code and reason)
- 0x9 — ping frame (keepalive, expects pong)
- 0xA — pong frame (response to ping)

For most application-level messaging you'll use text frames with JSON payloads. For high-throughput binary protocols — game state sync, sensor streams, audio chunks — binary frames with MessagePack or Protocol Buffers reduce payload size substantially compared to JSON.

Node.js WebSocket Server: Rooms and Broadcast

The ws library is the standard low-level WebSocket implementation for Node.js. Here is a complete server with room-based broadcasting — the pattern you need for chat, presence, and any multi-tenant real-time feature:

import { WebSocketServer, WebSocket } from "ws";
import { createServer } from "http";
import { parse } from "url";

interface Client {
  ws: WebSocket;
  userId: string;
  room: string;
}

// Map from roomId → Set of connected clients in that room
const rooms = new Map<string, Set<Client>>();

const server = createServer();
const wss = new WebSocketServer({ server });

wss.on("connection", (ws: WebSocket, req) => {
  const { query } = parse(req.url ?? "", true);
  const userId = String(query.userId ?? "anonymous");
  const room = String(query.room ?? "default");

  // Validate JWT or session token here before proceeding
  // If auth fails: ws.close(4001, "Unauthorized"); return;

  const client: Client = { ws, userId, room };

  // Add client to room
  if (!rooms.has(room)) rooms.set(room, new Set());
  rooms.get(room)!.add(client);

  console.log(`[${room}] ${userId} connected. Room size: ${rooms.get(room)!.size}`);

  // Notify others in the room of the new presence
  broadcastToRoom(room, { type: "presence", userId, event: "joined" }, client);

  // Heartbeat: detect dead connections that didn't send a close frame
  // (common with mobile networks, NAT timeouts, browser tab crashes)
  let isAlive = true;
  ws.on("pong", () => { isAlive = true; });

  const heartbeatInterval = setInterval(() => {
    if (!isAlive) {
      // No pong received — connection is dead, terminate it
      console.warn(`[${room}] ${userId} heartbeat timeout, terminating`);
      ws.terminate();
      return;
    }
    isAlive = false;
    ws.ping(); // Send ping, expect pong back within next interval
  }, 30_000); // 30-second heartbeat interval

  ws.on("message", (data: Buffer) => {
    let message: Record<string, unknown>;
    try {
      message = JSON.parse(data.toString());
    } catch {
      ws.send(JSON.stringify({ error: "invalid JSON" }));
      return;
    }

    // Route by message type
    switch (message.type) {
      case "chat":
        broadcastToRoom(room, {
          type: "chat",
          userId,
          text: message.text,
          ts: Date.now(),
        });
        break;

      case "ping":
        // Application-level ping (distinct from WebSocket protocol ping)
        ws.send(JSON.stringify({ type: "pong", ts: Date.now() }));
        break;

      default:
        ws.send(JSON.stringify({ error: "unknown message type" }));
    }
  });

  ws.on("close", () => {
    clearInterval(heartbeatInterval);
    rooms.get(room)?.delete(client);
    if (rooms.get(room)?.size === 0) rooms.delete(room);
    broadcastToRoom(room, { type: "presence", userId, event: "left" });
    console.log(`[${room}] ${userId} disconnected`);
  });

  ws.on("error", (err) => {
    console.error(`[${room}] ${userId} error:`, err.message);
    clearInterval(heartbeatInterval);
    rooms.get(room)?.delete(client);
  });

  // Send initial room state to the newly connected client
  ws.send(JSON.stringify({
    type: "init",
    room,
    members: [...(rooms.get(room) ?? [])].map(c => c.userId),
  }));
});

function broadcastToRoom(
  room: string,
  message: Record<string, unknown>,
  exclude?: Client
): void {
  const clients = rooms.get(room);
  if (!clients) return;
  const payload = JSON.stringify(message);
  for (const client of clients) {
    // Skip the sender if excluded, and skip any connection not in OPEN state
    if (client === exclude) continue;
    if (client.ws.readyState === WebSocket.OPEN) {
      client.ws.send(payload);
    }
  }
}

server.listen(8080, () => console.log("WebSocket server on :8080"));

Client Reconnection with Exponential Backoff

Connections drop. Mobile networks switch, laptops sleep, browsers navigate. A production client must reconnect automatically:

class ReconnectingWebSocket {
  private ws: WebSocket | null = null;
  private attempt = 0;
  private readonly maxDelay = 30_000; // cap at 30 seconds
  private readonly baseDelay = 500;   // start at 500ms

  constructor(
    private readonly url: string,
    private readonly onMessage: (data: unknown) => void
  ) {
    this.connect();
  }

  private connect(): void {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      console.log("Connected");
      this.attempt = 0; // reset backoff on successful connect
    };

    this.ws.onmessage = (event) => {
      try {
        this.onMessage(JSON.parse(event.data));
      } catch {
        console.warn("Non-JSON message received:", event.data);
      }
    };

    this.ws.onclose = () => {
      const delay = Math.min(
        this.baseDelay * Math.pow(2, this.attempt) + Math.random() * 500,
        this.maxDelay
      );
      this.attempt++;
      console.log(`Reconnecting in ${Math.round(delay)}ms (attempt ${this.attempt})`);
      setTimeout(() => this.connect(), delay);
    };

    this.ws.onerror = () => {
      // onclose fires after onerror — let it handle reconnection
      this.ws?.close();
    };
  }

  send(data: unknown): void {
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(data));
    }
  }
}

The jitter (Math.random() * 500) is critical when you have many clients reconnecting simultaneously after a server restart — without it, the thundering herd hits your server in synchronized waves.

When WebSockets Are Overkill

WebSockets maintain a persistent TCP connection for their entire lifetime. Each connection consumes a file descriptor on the server. With the default ulimit -n on Linux (1024), an untuned server runs out of file descriptors at 1024 simultaneous connections — a completely avoidable problem, but one that illustrates the statefulness cost.

Do not use WebSockets for:
- Low-frequency updates — polling every 30 seconds (stock closing prices, batch job status) costs less than a persistent connection
- One-way data flow — if the client never sends messages back, SSE is simpler and more reliable
- HTTP/2 push scenarios — SSE over HTTP/2 multiplexes multiple streams over one connection at no extra cost

sequenceDiagram participant C as Client participant S as WebSocket Server participant R as Room Registry C->>S: GET /chat?room=general&userId=alice (HTTP Upgrade) S->>C: 101 Switching Protocols Note over C,S: TCP connection now WebSocket C->>S: {type: "join", room: "general"} S->>R: Register alice in room "general" R-->>S: Room members: [alice, bob, carol] S->>C: {type: "init", members: ["bob","carol"]} S-->>S: Broadcast {type:"presence", user:"alice", event:"joined"} to bob, carol C->>S: {type: "chat", text: "hello"} S->>R: Lookup "general" members R-->>S: [alice, bob, carol] S->>C: {type: "chat", userId:"alice", text:"hello"} S-->>S: Forward to bob and carol loop Every 30s S->>C: PING (protocol frame) C->>S: PONG (protocol frame) end

2. Server-Sent Events: One-Way Streams

Server-Sent Events are the underused tool in most engineers' real-time toolkit. They solve a specific problem extremely well: the server needs to push a stream of events to the client, but the client does not need to send data back over the same connection.

How SSE Works

SSE uses plain HTTP. The client makes an ordinary GET request, and the server responds with Content-Type: text/event-stream and keeps the connection open, writing newline-delimited events as they occur. There is no new protocol, no handshake, no custom framing — it runs over HTTP/1.1 or HTTP/2 without modification.

The event format is simple text:

id: 42
event: price-update
data: {"symbol":"AAPL","price":213.40,"change":+1.2}

id: 43
event: price-update
data: {"symbol":"GOOG","price":177.85,"change":-0.8}

Each event ends with a blank line. The id field is what makes SSE powerful: when the connection drops and the EventSource reconnects, it sends the Last-Event-ID header with the last event ID it received. Your server can use this to replay missed events from a queue or database. Zero message loss with zero application code — the protocol handles it.

Named events (event: price-update) let a single stream carry multiple event types. The client subscribes selectively:

const source = new EventSource("/api/stream/market");

// Listen to specific named events
source.addEventListener("price-update", (event) => {
  const data = JSON.parse(event.data);
  updatePriceDisplay(data.symbol, data.price);
});

source.addEventListener("trade-executed", (event) => {
  const trade = JSON.parse(event.data);
  appendTradeToLog(trade);
});

// Generic message handler for unnamed events
source.onmessage = (event) => {
  console.log("Generic event:", event.data);
};

source.onerror = (err) => {
  // EventSource reconnects automatically — this fires on each retry
  // source.readyState === EventSource.CONNECTING means it's retrying
  console.warn("SSE error, reconnecting...", source.readyState);
};

No library required. EventSource is built into every browser and has been since 2012.

Python FastAPI SSE Endpoint

from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
import asyncio
import json
import time
from typing import AsyncGenerator

app = FastAPI()

# In production, replace with Redis pub/sub or a real event queue
async def market_event_generator(
    request: Request,
    last_event_id: str | None
) -> AsyncGenerator[str, None]:
    """
    Generate SSE-formatted events for market data stream.
    last_event_id allows replay from a specific point.
    """
    event_id = int(last_event_id) + 1 if last_event_id else 1

    # If the client reconnected mid-stream, replay missed events here
    # e.g., fetch events with id > last_event_id from your event store

    while True:
        # Check if the client has disconnected
        if await request.is_disconnected():
            print(f"Client disconnected at event {event_id}")
            break

        # Fetch the next event from your data source
        # Here: simulated market tick
        event_data = {
            "symbol": "AAPL",
            "price": 213.40 + (event_id % 5) * 0.1,
            "ts": time.time(),
        }

        # SSE format: each field on its own line, blank line terminates event
        yield f"id: {event_id}\n"
        yield f"event: price-update\n"
        yield f"data: {json.dumps(event_data)}\n"
        yield "\n"  # blank line = end of event

        event_id += 1
        await asyncio.sleep(1)  # 1-second tick interval


@app.get("/api/stream/market")
async def market_stream(request: Request):
    last_event_id = request.headers.get("Last-Event-ID")

    return StreamingResponse(
        market_event_generator(request, last_event_id),
        media_type="text/event-stream",
        headers={
            # Prevent buffering — critical for SSE to work through proxies
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",  # Disable nginx buffering
            "Connection": "keep-alive",
        },
    )


@app.get("/api/stream/notifications/{user_id}")
async def notification_stream(user_id: str, request: Request):
    """
    Per-user notification stream. In production, subscribe to a Redis
    pub/sub channel keyed by user_id here.
    """
    async def event_generator() -> AsyncGenerator[str, None]:
        # Send a heartbeat comment every 20 seconds to prevent proxy timeout
        # SSE comments start with ':'  — browsers ignore them
        heartbeat_id = 0
        while True:
            if await request.is_disconnected():
                break
            # Heartbeat keeps the connection alive through aggressive proxies
            yield f": heartbeat {heartbeat_id}\n\n"
            heartbeat_id += 1
            await asyncio.sleep(20)

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
    )

SSE over HTTP/2

Under HTTP/1.1, browsers limit connections per host to six. An SSE connection consumes one of those six slots, which can starve other requests on the same domain. Under HTTP/2, all requests share a single multiplexed connection — SSE becomes just another stream on that connection. If your server supports HTTP/2 (nginx with http2 directive, Caddy by default, Cloudflare always), SSE scales substantially better. You can open dozens of SSE streams per tab without connection pressure.

When SSE Beats WebSocket

Use SSE when:
- The client only consumes data (dashboards, notification feeds, live logs, activity streams)
- You want built-in reconnection and event replay without writing reconnect logic
- You are streaming LLM token output to a browser — every major AI product in 2026 uses SSE for this
- You want to run behind a standard HTTP reverse proxy without WebSocket upgrade configuration
- You need to fan out server events to many read-only consumers

sequenceDiagram participant C as Client (EventSource) participant P as Proxy / CDN participant S as FastAPI Server C->>P: GET /api/stream/market (Accept: text/event-stream) P->>S: Forward request S->>P: 200 OK, Content-Type: text/event-stream P->>C: 200 OK (connection held open) loop Every 1s S->>P: id:1\nevent:price-update\ndata:{...}\n\n P->>C: Forward event chunk C->>C: Fire "price-update" event listener end Note over P,C: Network drop / proxy timeout C->>C: EventSource auto-reconnects after 3s C->>P: GET /api/stream/market\nLast-Event-ID: 47 P->>S: Forward with Last-Event-ID: 47 S->>S: Replay events 48+ from queue S->>P: Resume stream from id:48 P->>C: id:48\nevent:price-update\ndata:{...}\n\n

3. WebRTC: Peer-to-Peer Media

WebRTC is a different category entirely. It is not a transport you use for application data under normal circumstances. It exists for one primary reason: moving audio, video, and arbitrary data between browsers with the lowest possible latency, without routing that data through your servers.

The Use Case

When you make a video call on Google Meet, Zoom, or Discord, the video frames are not going from your browser to a server and back to the other person. They travel directly between the two browsers — or through a media relay if direct connection isn't possible. That direct path eliminates a server hop, cuts latency roughly in half, and means your servers don't pay for the bandwidth of transmitting video frames. For a platform like Discord handling 8M+ concurrent voice connections, the bandwidth savings are enormous.

The Signaling Dance

WebRTC connections require a signaling channel to negotiate the connection. The signaling mechanism is intentionally not specified by the WebRTC standard — you can use WebSockets, SSE, HTTP long-polling, or carrier pigeon. In practice, everyone uses WebSockets.

The negotiation has two parts:

Session Description Protocol (SDP) offer/answer: Peer A creates an offer describing its media capabilities (codecs it supports, bandwidth parameters, data channel intent). It sends this to Peer B via the signaling channel. Peer B responds with an answer. Both sides now know what the connection will carry.

ICE candidates: WebRTC uses the Interactive Connectivity Establishment framework to find the best network path between peers. Each browser generates a list of candidate addresses — local IP, reflexive IP from a STUN server, relayed IP from a TURN server — and exchanges them via the signaling channel. The ICE agent tries each pair to find the one with the lowest latency.

// Simplified WebRTC connection setup (both sides follow this pattern)
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: "stun:stun.l.google.com:19302" }, // Free STUN server for NAT traversal
    {
      // TURN relay — required when STUN fails (symmetric NAT, firewalls)
      // You must run your own or use a paid service (Twilio, Cloudflare Calls)
      urls: "turn:turn.example.com:3478",
      username: "user",
      credential: "pass",
    },
  ],
});

// For data channels (arbitrary P2P data, no media required)
const dataChannel = pc.createDataChannel("game-state", {
  ordered: false,    // UDP-like: drop stale packets rather than wait for retransmit
  maxRetransmits: 0, // For game state: latest frame wins, don't retransmit old ones
});

dataChannel.onmessage = (event) => {
  const state = JSON.parse(event.data);
  applyGameState(state);
};

// Trickle ICE: send candidates as they're discovered, don't wait for all of them
pc.onicecandidate = (event) => {
  if (event.candidate) {
    signalingChannel.send({ type: "ice-candidate", candidate: event.candidate });
  }
};

// Caller side: create and send offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signalingChannel.send({ type: "offer", sdp: offer.sdp });

// Callee side: receive offer, create and send answer
signalingChannel.onmessage = async (msg) => {
  if (msg.type === "offer") {
    await pc.setRemoteDescription(new RTCSessionDescription(msg));
    const answer = await pc.createAnswer();
    await pc.setLocalDescription(answer);
    signalingChannel.send({ type: "answer", sdp: answer.sdp });
  }
  if (msg.type === "ice-candidate") {
    await pc.addIceCandidate(new RTCIceCandidate(msg.candidate));
  }
};

STUN vs TURN

STUN (Session Traversal Utilities for NAT) is a lightweight server that tells a browser its own public IP address. It's how the browser discovers its reflexive candidate. STUN servers are cheap to run and have free public instances. About 80% of WebRTC connections succeed with STUN alone.

TURN (Traversal Using Relays around NAT) is a relay server. For the other 20% — symmetric NATs, enterprise firewalls — direct P2P is impossible. TURN relays all media between the peers through the server. This is bandwidth-intensive: you pay for every byte of video. Cloudflare Calls and Twilio provide TURN as a service. If you run your own, budget for the bandwidth.

Latency Comparison

Transport	Typical Latency	Notes
WebRTC data channel	30–80 ms	P2P, no server hop in media path
WebSocket	80–200 ms	Server round-trip included
SSE	100–300 ms	One-way, server push
HTTP polling (1s)	0–1000 ms	Depends entirely on poll interval

WebRTC's latency advantage only matters when it matters a lot: video calls, real-time gaming, collaborative cursors. For chat, notifications, and dashboards, WebSocket or SSE latency is imperceptible to humans.

flowchart TD Start([What do you need?]) --> Q1{Does the client
send data back
to the server?} Q1 -->|No| SSE[Use SSE
Simpler, HTTP-native,
auto-reconnect, event replay] Q1 -->|Yes| Q2{Is media involved
video/audio/low-latency
P2P data?} Q2 -->|Yes| WebRTC[Use WebRTC
P2P, lowest latency,
handles NAT traversal] Q2 -->|No| Q3{Update frequency?} Q3 -->|Low
less than 1/min| Poll[HTTP Polling
Simplest, low overhead] Q3 -->|Medium–High
seconds to ms| Q4{Bidirectional
client and server
both initiate?} Q4 -->|Yes| WS[Use WebSocket
Full-duplex, persistent,
rooms + broadcast] Q4 -->|No — server pushes| SSE2[Use SSE
Unidirectional is enough] style SSE fill:#22c55e,color:#fff style SSE2 fill:#22c55e,color:#fff style WebRTC fill:#3b82f6,color:#fff style WS fill:#f59e0b,color:#fff style Poll fill:#94a3b8,color:#fff

4. Scaling Stateful Connections

A single Node.js process can handle approximately 10,000–20,000 concurrent WebSocket connections, depending on message throughput and per-connection memory usage. At that ceiling — or before it for reliability — you need multiple server instances. This is where real-time architecture gets hard.

The Stickiness Problem

HTTP is stateless. A load balancer can route any request to any backend instance because there is no per-instance state that makes one instance the "right" one for a given client. WebSocket connections are the opposite: once connected, a client is bound to a specific server instance for the duration of that connection. Room membership, subscription lists, and in-flight message buffers all live in that instance's memory.

If a client connects to Instance A, joins room "project-42", and a message arrives for "project-42", it must be delivered through Instance A. Instance B and Instance C don't know the client exists.

Sticky Sessions: Works Until It Doesn't

The simplest approach is IP-hash or cookie-based session affinity at the load balancer. nginx:

upstream websocket_backend {
    ip_hash;  # Route the same client IP to the same upstream
    server ws1.internal:8080;
    server ws2.internal:8080;
    server ws3.internal:8080;
}

This works for small deployments. It breaks down when:
- A server instance restarts — all its connections drop and clients reconnect, potentially to different instances
- IPv6 or CGNAT means many users share one IP (corporate networks)
- You need zero-downtime deploys — draining one instance means redistributing thousands of connections

Redis Pub/Sub: The Correct Solution

The production pattern is to move room state out of process memory and into Redis. Every server instance subscribes to channels in Redis. When a message needs to reach all members of room "project-42", it's published to a Redis channel. Every instance picks it up and delivers it to any local clients subscribed to that room.

Socket.io, the higher-level WebSocket abstraction library, has a first-class Redis adapter for exactly this:

import { createServer } from "http";
import { Server } from "socket.io";
import { createAdapter } from "@socket.io/redis-adapter";
import { createClient } from "redis";

const httpServer = createServer();
const io = new Server(httpServer, {
  cors: { origin: "https://app.example.com" },
  transports: ["websocket", "polling"], // Polling fallback for restrictive networks
});

// Two Redis clients: one for publishing, one for subscribing
// Redis requires separate clients because SUBSCRIBE puts a connection
// into subscriber mode and it cannot be used for other commands
const pubClient = createClient({ url: "redis://redis.internal:6379" });
const subClient = pubClient.duplicate();

await Promise.all([pubClient.connect(), subClient.connect()]);

// Wire Socket.io to Redis — now all emit/broadcast calls fan out across instances
io.adapter(createAdapter(pubClient, subClient));

io.on("connection", (socket) => {
  const userId = socket.handshake.auth.userId;
  const room = socket.handshake.query.room as string;

  if (!userId || !room) {
    socket.disconnect(true);
    return;
  }

  // Socket.io rooms are virtual groups. With the Redis adapter,
  // join/leave/emit are automatically synchronized across all instances.
  socket.join(room);
  console.log(`${userId} joined room ${room} on instance ${process.pid}`);

  // This emit reaches ALL clients in the room on ALL server instances
  io.to(room).emit("presence", { userId, event: "joined" });

  socket.on("chat", (text: string) => {
    // Validate and sanitize before broadcasting
    if (typeof text !== "string" || text.length > 1000) return;

    io.to(room).emit("chat", {
      userId,
      text: text.trim(),
      ts: Date.now(),
    });
  });

  socket.on("disconnecting", () => {
    io.to(room).emit("presence", { userId, event: "left" });
  });
});

httpServer.listen(8080, () => {
  console.log(`Instance ${process.pid} listening on :8080`);
});

With this setup, a client connected to Instance A can send a message that is delivered to a client on Instance C — via Redis pub/sub in approximately one additional millisecond of latency.

Kubernetes Considerations

In Kubernetes, WebSocket-backed deployments require deliberate configuration:

Ingress sticky sessions: The nginx ingress controller supports cookie-based affinity:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "ws-route"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"  # Keep WS alive
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /ws
            pathType: Prefix
            backend:
              service:
                name: websocket-service
                port:
                  number: 8080

Graceful shutdown: When Kubernetes sends SIGTERM to a pod, you have terminationGracePeriodSeconds to drain connections. Signal clients to reconnect before killing the process:

process.on("SIGTERM", async () => {
  console.log("SIGTERM received, draining connections...");

  // Tell all connected clients to reconnect (they'll go to a healthy instance)
  io.emit("server-restart", { reconnectIn: 3000 });

  // Stop accepting new connections
  httpServer.close();

  // Wait for clients to reconnect elsewhere, then exit
  setTimeout(() => {
    console.log("Graceful shutdown complete");
    process.exit(0);
  }, 5000);
});

Connection Limits and OS Tuning

Each WebSocket connection uses one file descriptor. Linux defaults are restrictive:

# Check current limits
ulimit -n       # Soft limit (often 1024 or 4096)
cat /proc/sys/fs/file-max  # System-wide maximum

# Raise for production WebSocket servers
# In /etc/security/limits.conf:
* soft nofile 65535
* hard nofile 65535

# Or per-process in systemd unit:
[Service]
LimitNOFILE=65535

At 65,535 file descriptors per process and one connection per file descriptor, a single process handles ~60,000 concurrent connections (leaving headroom for OS handles). For higher concurrency, use multiple processes via Node.js cluster module — each process gets its own file descriptor table. With the Redis adapter, all cluster workers share room state transparently.

Managed Services: When to Outsource

Running your own WebSocket infrastructure at scale is meaningful engineering work. Three managed options are worth knowing:

Service	Pricing	Right For
Ably	$29/mo base, ~$3/mo per 1M messages	Apps needing reliable delivery, presence, history
Pusher	$49/mo, 500 concurrent connections	Smaller apps, rapid prototyping
AWS API Gateway WebSocket	$1/million messages + $0.25/million connection-minutes	AWS-native apps, serverless

The break-even point where self-hosting beats Ably on cost is roughly 50 million messages/month — well beyond most startups. Until then, the engineering time saved is worth more than the subscription cost.

5. Collaborative Editing Patterns

Collaborative editing is the hardest real-time problem in common web development. When two users edit the same document simultaneously, you need a conflict resolution strategy that makes the result feel seamless — no overwriting, no lost changes.

Operational Transformation: The Historical Approach

Google Docs uses Operational Transformation (OT). The core idea: every operation (insert, delete) is represented as a data structure. When two operations conflict, a transform function adjusts them so they can be applied in either order and produce the same result.

OT works but it is algorithmically complex. Getting the transformation functions right for rich text is notoriously difficult, and the server must serialize all operations through a central authority to assign ordering. It doesn't work well offline.

CRDTs: The Modern Approach

Conflict-Free Replicated Data Types (CRDTs) take a different approach: design the data structure so that concurrent operations can always be merged without conflicts, regardless of order or network partitions. No server coordination required for merging. Works offline. Converges deterministically when peers sync.

Yjs is the dominant CRDT library in 2026. It implements a high-performance CRDT for text, arrays, and maps, and has providers for every transport layer.

Yjs with y-websocket: Full Implementation

// Server: y-websocket provider
// Install: npm install y-websocket yjs ws
import { WebSocketServer } from "ws";
import { setupWSConnection } from "y-websocket/bin/utils.js";
import * as Y from "yjs";
import { LeveldbPersistence } from "y-leveldb";

// Persist document state to disk so edits survive server restarts
const persistence = new LeveldbPersistence("./doc-storage");

const wss = new WebSocketServer({ port: 1234 });

wss.on("connection", (ws, req) => {
  // Extract document name from URL path, e.g. /doc/my-project-readme
  const docName = req.url?.slice(1) ?? "default";

  // setupWSConnection handles Yjs awareness and document sync protocol
  setupWSConnection(ws, req, {
    docName,
    gc: true, // Garbage collect deleted content to prevent unbounded growth
  });
});

// Restore persisted documents on startup
wss.on("listening", async () => {
  console.log("y-websocket server running on :1234");
});

// Client: collaborative text editor with Yjs + y-websocket
// Install: npm install yjs y-websocket y-codemirror.next @codemirror/view @codemirror/state
import * as Y from "yjs";
import { WebsocketProvider } from "y-websocket";
import { yCollab } from "y-codemirror.next";
import { EditorView, basicSetup } from "codemirror";
import { EditorState } from "@codemirror/state";

// Each document has a Y.Doc — the CRDT root
const ydoc = new Y.Doc();

// The shared text type — changes here sync to all connected peers
const ytext = ydoc.getText("content");

// Connect to the y-websocket server
const provider = new WebsocketProvider(
  "ws://localhost:1234",  // y-websocket server URL
  "my-document",          // Document name (room)
  ydoc,
  {
    connect: true,
    params: { auth: getAuthToken() }, // Pass auth token in query params
  }
);

// Awareness: broadcast cursor position and user identity to all peers
provider.awareness.setLocalStateField("user", {
  name: currentUser.name,
  color: currentUser.color, // e.g. "#f97316"
  colorLight: currentUser.colorLight,
});

provider.awareness.on("change", () => {
  // Render remote cursors / presence indicators
  const states = provider.awareness.getStates();
  renderPresenceIndicators([...states.entries()]);
});

// Mount the editor — yCollab extension connects CodeMirror to Yjs
const state = EditorState.create({
  doc: ytext.toString(),
  extensions: [
    basicSetup,
    yCollab(ytext, provider.awareness), // Handles sync + cursor decorations
  ],
});

const view = new EditorView({
  state,
  parent: document.getElementById("editor")!,
});

provider.on("status", ({ status }: { status: string }) => {
  // "connected" | "disconnected"
  document.getElementById("sync-status")!.textContent = status;
});

Offline Support and Persistence

Yjs supports offline editing natively. If a user edits a document while offline, the changes are buffered in the Y.Doc. When the provider reconnects, it performs a sync operation — exchanging state vectors with the server to determine what each side is missing. All offline changes are merged without conflicts.

For server-side persistence beyond LevelDB, y-mongodb stores document state in MongoDB. The document state is stored as a binary update log, not the full text — Yjs encodes incremental updates efficiently, and the storage cost is proportional to the number of operations, not the document size.

6. Production Considerations

Getting WebSocket architecture working in development is the easy part. Making it reliable in production requires attention to several operational concerns that don't surface until traffic scales or infrastructure changes.

Monitoring

The metrics that matter for real-time infrastructure:

Connection count over time — steady growth is healthy; sudden drops indicate a server crash or network partition
Message throughput — messages/second per instance; this tells you when you're approaching per-process limits
Connection duration distribution — median, p95, p99; long-tail connections indicate clients that haven't received a close frame
Reconnection rate — high reconnection rate indicates connection instability, flaky network paths, or aggressive proxy timeouts
Redis pub/sub latency — the p99 of the time between publishing a message to Redis and delivering it to a connected client; should be under 10ms in a well-configured setup

With Prometheus and Grafana, instrument your WebSocket server:

import { Counter, Gauge, Histogram, register } from "prom-client";

const wsConnections = new Gauge({
  name: "ws_connections_active",
  help: "Number of active WebSocket connections",
  labelNames: ["room"],
});

const wsMessages = new Counter({
  name: "ws_messages_total",
  help: "Total WebSocket messages processed",
  labelNames: ["type", "direction"],
});

const wsMessageDuration = new Histogram({
  name: "ws_message_processing_seconds",
  help: "WebSocket message processing latency",
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
});

// Call wsConnections.inc({ room }) on connect, wsConnections.dec({ room }) on close
// Call wsMessages.inc({ type: "chat", direction: "inbound" }) on message receipt

Rate Limiting WebSocket Messages

Without rate limiting, a single misbehaving client can flood your server with messages. Implement a per-connection token bucket:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,   // Max burst size
    private readonly refillRate: number  // Tokens added per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  consume(count = 1): boolean {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;

    if (this.tokens >= count) {
      this.tokens -= count;
      return true; // Allow
    }
    return false; // Reject
  }
}

// In connection handler:
const bucket = new TokenBucket(20, 5); // 20 burst, 5 messages/sec sustained

ws.on("message", (data) => {
  if (!bucket.consume()) {
    ws.send(JSON.stringify({ error: "rate_limited", retryAfter: 1 }));
    return; // Drop the message, do not process
  }
  // ... process message
});

Authentication

Never pass authentication credentials in WebSocket message payloads — by the time you parse the first message, the connection is already established. Validate before the upgrade completes.

The correct pattern is to pass a short-lived JWT as a query parameter in the WebSocket URL:

wss://api.example.com/ws?token=eyJhbGciOiJIUzI1NiJ9...

In the server's upgrade event handler (before the WebSocket connection is established):

server.on("upgrade", (req, socket, head) => {
  const { query } = parse(req.url ?? "", true);
  const token = String(query.token ?? "");

  verifyJWT(token)
    .then((payload) => {
      // Attach user info to request for use in connection handler
      (req as any).user = payload;
      wss.handleUpgrade(req, socket, head, (ws) => {
        wss.emit("connection", ws, req);
      });
    })
    .catch(() => {
      // Reject before WebSocket connection is established
      socket.write("HTTP/1.1 401 Unauthorized\r\n\r\n");
      socket.destroy();
    });
});

Query parameters are logged by proxies and visible in browser history. Use short-lived tokens (60-second TTL) generated specifically for this connection. Never reuse long-lived API keys in WebSocket URLs.

Binary Protocols

For high-throughput message streams — sensor data, game state, financial ticks — JSON is wasteful. A JSON-encoded object {"type":"tick","symbol":"AAPL","price":213.40} is ~45 bytes. The MessagePack equivalent is ~22 bytes. At 100,000 messages/second across 10,000 connections, that difference is 230GB/day in saved bandwidth.

import { encode, decode } from "@msgpack/msgpack";

// Sender
ws.send(encode({ type: "tick", symbol: "AAPL", price: 213.40 }));

// Receiver
ws.on("message", (data: Buffer) => {
  const message = decode(data) as Record<string, unknown>;
  // ... process message
});

Graceful Shutdown

When deploying a new version, your load balancer will route new connections to the updated instances. Existing connections on old instances need to be drained gracefully — not killed abruptly, which would cause a poor user experience and a sudden reconnect spike:

let isShuttingDown = false;

process.on("SIGTERM", () => {
  isShuttingDown = true;
  console.log("Shutting down — draining connections");

  // Stop accepting new connections
  wss.close();

  // Notify all connected clients to reconnect elsewhere
  wss.clients.forEach((client) => {
    if (client.readyState === WebSocket.OPEN) {
      // Send application-level signal, then close cleanly
      client.send(JSON.stringify({ type: "server-restart", reconnectIn: 2000 }));
      setTimeout(() => client.close(1001, "Server going away"), 2000);
    }
  });

  // Force exit after 30 seconds (safety net)
  setTimeout(() => process.exit(0), 30_000);
});

// Reject new connections during shutdown
wss.on("connection", (ws) => {
  if (isShuttingDown) {
    ws.close(1013, "Server shutting down, please reconnect");
    return;
  }
  // ... normal connection handling
});

Conclusion

Three transports, three different design points:

WebSockets when you need full-duplex communication — both the client and server initiate messages independently. Chat, multiplayer, collaborative editing, live gaming. Accept the operational complexity: stateful connections, sticky sessions or Redis pub/sub, OS-level file descriptor tuning.

SSE when the server pushes and the client listens. Notification feeds, live dashboards, LLM token streaming, activity streams. It is simpler to operate than WebSockets, runs over plain HTTP with no special load balancer configuration, and the EventSource built-in handles reconnection and event replay for you.

WebRTC when you need peer-to-peer media or require the absolute minimum latency for binary data. Video calls, screen sharing, real-time audio, P2P games. The signaling infrastructure is still your problem (typically WebSockets), but the data path bypasses your servers entirely.

For scaling, the decision is binary below 10,000 concurrent connections per instance: a single well-tuned Node.js process is sufficient. Above that threshold, Redis pub/sub with Socket.io's adapter is the standard horizontal scaling pattern. Managed services like Ably are worth their cost until you hit ~50 million messages/month.

For collaborative editing specifically, Yjs is the right library for 2026. Its CRDT approach handles offline edits, conflict resolution, and presence awareness without requiring you to implement any of that logic yourself.

The default mistake is reaching for WebSockets everywhere. Most features that feel like they need WebSockets actually only need one-way server push — and SSE gets there with less infrastructure, less client code, and better behavior on reconnect. Pick the simplest transport that satisfies your actual requirements.

Sources

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-06-12 · Updated: 2026-04-18 · Written with AI assistance, reviewed by Toc Am.

Get These In Your Inbox

Weekly deep-dives on AI engineering, no fluff. Join the newsletter →

Subscribe (free)

Or grab the book ($39, ~100 pages) · Buy me a coffee

☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter