--- id: backend-graceful-shutdown title: Graceful Shutdown — Drain / SIGTERM / 작업 완료 category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [backend, shutdown, vibe-coding] tech_stack: { language: "TS / Node", applicable_to: ["Backend"] } applied_in: [] aliases: [graceful shutdown, SIGTERM, drain, terminationGracePeriod, request bleeding] --- # Graceful Shutdown > Deploy / scale-down 시 실행 중 request 잃지 않기. **SIGTERM → readiness off → drain in-flight → close DB → exit**. K8s `terminationGracePeriodSeconds`. ## 📖 핵심 개념 - SIGTERM: 종료 신호 (graceful). - SIGKILL: 강제 — 30s 후 보통. - Drain: 새 request 차단 + 기존 완료 대기. - Readiness off: LB 가 traffic 안 보냄. ## 💻 코드 패턴 ### 기본 (Express) ```ts import { createServer } from 'node:http'; const server = createServer(app); server.listen(3000); let shuttingDown = false; const inflight = new Set>(); app.use((req, res, next) => { if (shuttingDown) { res.set('Connection', 'close'); return res.status(503).end('Shutting down'); } next(); }); async function shutdown(signal: string) { console.log(`Received ${signal}, shutting down`); shuttingDown = true; // 1. Stop accepting new requests server.close((err) => { if (err) console.error('Server close error', err); }); // 2. Wait for in-flight (timeout 25s) const timeout = setTimeout(() => { console.error('Forced shutdown — in-flight not done'); process.exit(1); }, 25000); await Promise.allSettled(inflight); clearTimeout(timeout); // 3. Close DB / external connections await db.$disconnect(); await redis.quit(); console.log('Bye'); process.exit(0); } process.on('SIGTERM', () => shutdown('SIGTERM')); process.on('SIGINT', () => shutdown('SIGINT')); ``` ### K8s pod lifecycle ```yaml spec: terminationGracePeriodSeconds: 30 containers: - name: api lifecycle: preStop: exec: command: ["sh", "-c", "sleep 5"] # endpoint propagation 기다림 readinessProbe: httpGet: { path: /readyz, port: 3000 } periodSeconds: 5 ``` ``` K8s shutdown sequence: 1. Pod marked Terminating 2. preStop hook 실행 (5s sleep — LB 가 endpoint 제거 기다림) 3. SIGTERM 발송 4. App 가 graceful shutdown 5. terminationGracePeriodSeconds (30s) 후 SIGKILL ``` ### Readiness off 패턴 ```ts app.get('/readyz', (req, res) => { if (shuttingDown) return res.status(503).end(); res.status(200).end(); }); async function shutdown() { shuttingDown = true; // K8s 가 readiness fail 검출 → traffic 차단 await sleep(10000); // 다음 readiness probe + endpoint propagation // 이제 새 traffic 안 옴 — 안전하게 종료 server.close(); // ... } ``` → preStop sleep + readiness off 둘 다 — 안전. ### Inflight 추적 ```ts let inflightCount = 0; app.use((req, res, next) => { inflightCount++; res.on('finish', () => { inflightCount--; }); res.on('close', () => { inflightCount--; }); next(); }); async function waitForInflight(timeoutMs: number) { const start = Date.now(); while (inflightCount > 0 && Date.now() - start < timeoutMs) { await sleep(100); } if (inflightCount > 0) { console.error(`${inflightCount} requests still in flight after ${timeoutMs}ms`); } } ``` ### Job worker — drain ```ts let processing = false; async function workerLoop() { while (!shuttingDown) { const job = await queue.fetchNext(); if (!job) { await sleep(1000); continue; } processing = true; try { await processJob(job); await queue.ack(job); } catch (e) { await queue.nack(job, e); } finally { processing = false; } } } async function shutdown() { shuttingDown = true; // 현재 job 끝나기 기다림 while (processing) await sleep(100); // 큐 connection 닫기 await queue.close(); } ``` ### Connection drain (DB) ```ts async function shutdown() { // ... // Pool 의 새 connection acquire 차단 await db.$disconnect(); // → 기존 connection commit 후 close } ``` ### WebSocket 연결 종료 ```ts async function shutdown() { shuttingDown = true; // 모든 client 에 알림 for (const ws of wss.clients) { ws.send(JSON.stringify({ type: 'server.shutdown' })); setTimeout(() => ws.close(1001, 'going away'), 5000); } // 또는 즉시 wss.close(); } ``` → Client 가 reconnect (다른 instance 로). ### Long-running request ```ts // 매우 긴 stream — graceful 어려움 // → 명시적 limit app.post('/long', async (req, res) => { const ac = new AbortController(); req.on('close', () => ac.abort()); // shutdown 시 abort shutdownAbortController.signal.addEventListener('abort', () => ac.abort()); for await (const chunk of stream(ac.signal)) { res.write(chunk); } }); ``` ### Fastify + onClose ```ts import Fastify from 'fastify'; import gracefulShutdown from 'fastify-graceful-shutdown'; const app = Fastify(); app.register(gracefulShutdown); app.gracefulShutdown((signal, next) => { console.log('shutting down', signal); next(); }); ``` ### Health check 구별 ``` Liveness: 살아있나 (process) Readiness: traffic 받을 수 있나 (state) Shutdown: - Liveness 는 OK (still running) - Readiness 가 fail (shutting down) → Pod restart 안 됨, 단 LB 가 traffic 안 보냄. ``` ### Test ```ts test('graceful shutdown completes inflight', async () => { const longReq = fetch('http://localhost:3000/slow'); await sleep(100); // request 시작 // SIGTERM 보내기 process.emit('SIGTERM'); // longReq 가 정상 완료 const r = await longReq; expect(r.ok).toBe(true); }); ``` ### Common gotchas ``` 1. K8s endpoint propagation = ~5-10s. preStop sleep. 2. Connection: close header 안 보내면 keepalive — 다시 같은 conn. 3. Database 의 idle connection — pool drain. 4. Long-polling / SSE — explicit close. 5. Async after response — track. ``` ### terminationGracePeriodSeconds ``` Default: 30s. Long task: 60-120s. Background job: 300s (5min). → App 가 이 시간 안 마무리 못 하면 SIGKILL. ``` ### Forced shutdown ```ts let forceTimer: NodeJS.Timeout; async function shutdown() { forceTimer = setTimeout(() => { console.error('Force exit'); process.exit(1); }, 25000); // K8s grace 보다 짧게 await graceful(); clearTimeout(forceTimer); process.exit(0); } ``` ### Logging ```ts log.info('shutdown.start', { signal }); log.info('shutdown.readiness-off'); log.info('shutdown.draining', { inflightCount }); log.info('shutdown.db-closed'); log.info('shutdown.complete'); ``` ## 🤔 의사결정 기준 | 작업 | 추천 | |---|---| | HTTP API | preStop sleep + readiness off + drain | | Worker | Drain current job + close queue | | WebSocket | Notify client + close | | Long stream | Abort signal + close | | Cron job | 완료 후 종료 | | DB connection | Pool drain | ## ❌ 안티패턴 - **SIGTERM 무시**: K8s 가 SIGKILL — request 잃음. - **Readiness 그대로**: traffic 계속 와서 새 request 처리. - **PreStop 없음**: endpoint propagation 전 종료. - **Force exit 즉시**: 진행 중 작업 잃음. - **Inflight tracking 없음**: 언제 끝났는지 모름. - **DB close 없음**: connection stuck. - **Test 없음 prod 첫 시도**: 깨짐. ## 🤖 LLM 활용 힌트 - preStop sleep 5-10s + readiness off + drain inflight + DB close 4종. - terminationGracePeriodSeconds = 30-60s 보통. - Force timeout < grace period. ## 🔗 관련 문서 - [[Backend_Health_Check_Patterns]] - [[DevOps_Kubernetes_Basics]] - [[Backend_Service_Discovery]]