Files
2nd/10_Wiki/Topics/Coding/Backend_Graceful_Shutdown.md
T
2026-05-09 21:08:02 +09:00

335 lines
7.6 KiB
Markdown

---
id: backend-graceful-shutdown
title: Graceful Shutdown — Drain / SIGTERM / 작업 완료
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [backend, shutdown, vibe-coding]
tech_stack: { language: "TS / Node", applicable_to: ["Backend"] }
applied_in: []
aliases: [graceful shutdown, SIGTERM, drain, terminationGracePeriod, request bleeding]
---
# Graceful Shutdown
> Deploy / scale-down 시 실행 중 request 잃지 않기. **SIGTERM → readiness off → drain in-flight → close DB → exit**. K8s `terminationGracePeriodSeconds`.
## 📖 핵심 개념
- SIGTERM: 종료 신호 (graceful).
- SIGKILL: 강제 — 30s 후 보통.
- Drain: 새 request 차단 + 기존 완료 대기.
- Readiness off: LB 가 traffic 안 보냄.
## 💻 코드 패턴
### 기본 (Express)
```ts
import { createServer } from 'node:http';
const server = createServer(app);
server.listen(3000);
let shuttingDown = false;
const inflight = new Set<Promise<void>>();
app.use((req, res, next) => {
if (shuttingDown) {
res.set('Connection', 'close');
return res.status(503).end('Shutting down');
}
next();
});
async function shutdown(signal: string) {
console.log(`Received ${signal}, shutting down`);
shuttingDown = true;
// 1. Stop accepting new requests
server.close((err) => {
if (err) console.error('Server close error', err);
});
// 2. Wait for in-flight (timeout 25s)
const timeout = setTimeout(() => {
console.error('Forced shutdown — in-flight not done');
process.exit(1);
}, 25000);
await Promise.allSettled(inflight);
clearTimeout(timeout);
// 3. Close DB / external connections
await db.$disconnect();
await redis.quit();
console.log('Bye');
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
```
### K8s pod lifecycle
```yaml
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"] # endpoint propagation 기다림
readinessProbe:
httpGet: { path: /readyz, port: 3000 }
periodSeconds: 5
```
```
K8s shutdown sequence:
1. Pod marked Terminating
2. preStop hook 실행 (5s sleep — LB 가 endpoint 제거 기다림)
3. SIGTERM 발송
4. App 가 graceful shutdown
5. terminationGracePeriodSeconds (30s) 후 SIGKILL
```
### Readiness off 패턴
```ts
app.get('/readyz', (req, res) => {
if (shuttingDown) return res.status(503).end();
res.status(200).end();
});
async function shutdown() {
shuttingDown = true;
// K8s 가 readiness fail 검출 → traffic 차단
await sleep(10000); // 다음 readiness probe + endpoint propagation
// 이제 새 traffic 안 옴 — 안전하게 종료
server.close();
// ...
}
```
→ preStop sleep + readiness off 둘 다 — 안전.
### Inflight 추적
```ts
let inflightCount = 0;
app.use((req, res, next) => {
inflightCount++;
res.on('finish', () => { inflightCount--; });
res.on('close', () => { inflightCount--; });
next();
});
async function waitForInflight(timeoutMs: number) {
const start = Date.now();
while (inflightCount > 0 && Date.now() - start < timeoutMs) {
await sleep(100);
}
if (inflightCount > 0) {
console.error(`${inflightCount} requests still in flight after ${timeoutMs}ms`);
}
}
```
### Job worker — drain
```ts
let processing = false;
async function workerLoop() {
while (!shuttingDown) {
const job = await queue.fetchNext();
if (!job) {
await sleep(1000);
continue;
}
processing = true;
try {
await processJob(job);
await queue.ack(job);
} catch (e) {
await queue.nack(job, e);
} finally {
processing = false;
}
}
}
async function shutdown() {
shuttingDown = true;
// 현재 job 끝나기 기다림
while (processing) await sleep(100);
// 큐 connection 닫기
await queue.close();
}
```
### Connection drain (DB)
```ts
async function shutdown() {
// ...
// Pool 의 새 connection acquire 차단
await db.$disconnect();
// → 기존 connection commit 후 close
}
```
### WebSocket 연결 종료
```ts
async function shutdown() {
shuttingDown = true;
// 모든 client 에 알림
for (const ws of wss.clients) {
ws.send(JSON.stringify({ type: 'server.shutdown' }));
setTimeout(() => ws.close(1001, 'going away'), 5000);
}
// 또는 즉시
wss.close();
}
```
→ Client 가 reconnect (다른 instance 로).
### Long-running request
```ts
// 매우 긴 stream — graceful 어려움
// → 명시적 limit
app.post('/long', async (req, res) => {
const ac = new AbortController();
req.on('close', () => ac.abort());
// shutdown 시 abort
shutdownAbortController.signal.addEventListener('abort', () => ac.abort());
for await (const chunk of stream(ac.signal)) {
res.write(chunk);
}
});
```
### Fastify + onClose
```ts
import Fastify from 'fastify';
import gracefulShutdown from 'fastify-graceful-shutdown';
const app = Fastify();
app.register(gracefulShutdown);
app.gracefulShutdown((signal, next) => {
console.log('shutting down', signal);
next();
});
```
### Health check 구별
```
Liveness: 살아있나 (process)
Readiness: traffic 받을 수 있나 (state)
Shutdown:
- Liveness 는 OK (still running)
- Readiness 가 fail (shutting down)
→ Pod restart 안 됨, 단 LB 가 traffic 안 보냄.
```
### Test
```ts
test('graceful shutdown completes inflight', async () => {
const longReq = fetch('http://localhost:3000/slow');
await sleep(100); // request 시작
// SIGTERM 보내기
process.emit('SIGTERM');
// longReq 가 정상 완료
const r = await longReq;
expect(r.ok).toBe(true);
});
```
### Common gotchas
```
1. K8s endpoint propagation = ~5-10s. preStop sleep.
2. Connection: close header 안 보내면 keepalive — 다시 같은 conn.
3. Database 의 idle connection — pool drain.
4. Long-polling / SSE — explicit close.
5. Async after response — track.
```
### terminationGracePeriodSeconds
```
Default: 30s.
Long task: 60-120s.
Background job: 300s (5min).
→ App 가 이 시간 안 마무리 못 하면 SIGKILL.
```
### Forced shutdown
```ts
let forceTimer: NodeJS.Timeout;
async function shutdown() {
forceTimer = setTimeout(() => {
console.error('Force exit');
process.exit(1);
}, 25000); // K8s grace 보다 짧게
await graceful();
clearTimeout(forceTimer);
process.exit(0);
}
```
### Logging
```ts
log.info('shutdown.start', { signal });
log.info('shutdown.readiness-off');
log.info('shutdown.draining', { inflightCount });
log.info('shutdown.db-closed');
log.info('shutdown.complete');
```
## 🤔 의사결정 기준
| 작업 | 추천 |
|---|---|
| HTTP API | preStop sleep + readiness off + drain |
| Worker | Drain current job + close queue |
| WebSocket | Notify client + close |
| Long stream | Abort signal + close |
| Cron job | 완료 후 종료 |
| DB connection | Pool drain |
## ❌ 안티패턴
- **SIGTERM 무시**: K8s 가 SIGKILL — request 잃음.
- **Readiness 그대로**: traffic 계속 와서 새 request 처리.
- **PreStop 없음**: endpoint propagation 전 종료.
- **Force exit 즉시**: 진행 중 작업 잃음.
- **Inflight tracking 없음**: 언제 끝났는지 모름.
- **DB close 없음**: connection stuck.
- **Test 없음 prod 첫 시도**: 깨짐.
## 🤖 LLM 활용 힌트
- preStop sleep 5-10s + readiness off + drain inflight + DB close 4종.
- terminationGracePeriodSeconds = 30-60s 보통.
- Force timeout < grace period.
## 🔗 관련 문서
- [[Backend_Health_Check_Patterns]]
- [[DevOps_Kubernetes_Basics]]
- [[Backend_Service_Discovery]]