--- id: backend-websocket-scaling title: WebSocket Scaling — Pub/Sub / Sticky / Heartbeat category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [backend, websocket, scaling, pubsub, vibe-coding] tech_stack: { language: "TS / Node / Redis", applicable_to: ["Backend"] } applied_in: [] aliases: [WS scaling, Redis pub/sub, sticky session, socket.io adapter] --- # WebSocket Scaling > 1대 서버 = 수만 connections OK. **N 대 서버 분산 시** 메시지 중계가 핵심. Redis pub/sub, NATS, Kafka 사용. **Sticky session + heartbeat + reconnect with backoff**. ## 📖 핵심 개념 - 1 connection = 한 process 메모리. 1대 ~50K 까지 보통. - Multi-node 시 client A → server 1, client B → server 2 — 어떻게 broadcast? - Heartbeat: idle conn 정리, NAT timeout 방지. ## 💻 코드 패턴 ### 단일 서버 — ws ```ts import { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080 }); const rooms = new Map>(); wss.on('connection', (ws, req) => { const room = new URL(req.url!, 'http://x').searchParams.get('room')!; if (!rooms.has(room)) rooms.set(room, new Set()); rooms.get(room)!.add(ws); ws.on('message', (data) => { for (const peer of rooms.get(room)!) { if (peer !== ws && peer.readyState === ws.OPEN) peer.send(data); } }); ws.on('close', () => rooms.get(room)?.delete(ws)); }); ``` ### Multi-node — Redis pub/sub ```ts import Redis from 'ioredis'; const pub = new Redis(url); const sub = new Redis(url); sub.subscribe('room:42', () => {}); sub.on('message', (channel, msg) => { const room = channel.split(':')[1]; for (const peer of localRooms.get(room) ?? []) peer.send(msg); }); // 메시지 도착하면 pub.publish(`room:${room}`, JSON.stringify(message)); ``` ### Heartbeat (양방향) ```ts function heartbeat(ws: WebSocket) { let alive = true; ws.on('pong', () => { alive = true; }); const t = setInterval(() => { if (!alive) { ws.terminate(); return; } alive = false; ws.ping(); }, 30_000); ws.on('close', () => clearInterval(t)); } ``` ### Reconnect (client) with backoff ```ts class ReconnectingWS { private retries = 0; connect() { this.ws = new WebSocket(this.url); this.ws.onopen = () => { this.retries = 0; }; this.ws.onclose = () => { const delay = Math.min(1000 * 2 ** this.retries, 30_000) + Math.random() * 1000; this.retries++; setTimeout(() => this.connect(), delay); }; } } ``` ### Backpressure ```ts // ws send 가 buffer 한계 초과 시 if (ws.bufferedAmount > 1_000_000) { // drop or disconnect — 느린 client 가 메모리 고갈 ws.close(1009, 'too slow'); } ``` ### LB sticky (cookie or IP hash) ```nginx upstream ws_backend { ip_hash; # 간단 server ws1:8080; server ws2:8080; } server { location /ws { proxy_pass http://ws_backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_read_timeout 3600s; } } ``` ### Auth (during handshake) ```ts wss.on('connection', (ws, req) => { const token = new URL(req.url!, 'http://x').searchParams.get('token'); const user = verifyJwt(token); if (!user) return ws.close(4401, 'unauth'); ws.user = user; }); ``` ## 🤔 의사결정 기준 | 규모 | 솔루션 | |---|---| | <10K conn | 단일 Node + ws | | <100K | 다중 Node + Redis pub/sub | | 100K+ | NATS / Kafka / 전용 service (Centrifugo, Soketi) | | Pub/sub + 메시지 영속 | Kafka / Redis Streams | | 게임 (low latency) | UDP / WebRTC | | Chat / 알림 | WebSocket / SSE | ## ❌ 안티패턴 - **Heartbeat 없음**: NAT timeout (60s+) 후 dead conn 남음. - **Reconnect 즉시 무한**: 서버 다운 시 thundering herd. - **Auth 만 메시지 첫 번째로**: 무인증 conn 점유. handshake 에서. - **Send back-pressure 무시**: 메모리 폭발. - **Single-node assumption**: 두 서버 띄우면 broadcast 안 됨. - **메시지에 PII 그대로**: TLS 필수 (wss). - **Reconnect 시 missed message 무시**: server 측 message id 큐 + replay. ## 🤖 LLM 활용 힌트 - Heartbeat (30s) + reconnect (exponential + jitter) + sticky (ip_hash). - N 노드 = Redis pub/sub. ## 🔗 관련 문서