335 lines
7.6 KiB
Markdown
335 lines
7.6 KiB
Markdown
---
|
|
id: backend-graceful-shutdown
|
|
title: Graceful Shutdown — Drain / SIGTERM / 작업 완료
|
|
category: Coding
|
|
status: draft
|
|
source_trust_level: B
|
|
verification_status: conceptual
|
|
created_at: 2026-05-09
|
|
updated_at: 2026-05-09
|
|
tags: [backend, shutdown, vibe-coding]
|
|
tech_stack: { language: "TS / Node", applicable_to: ["Backend"] }
|
|
applied_in: []
|
|
aliases: [graceful shutdown, SIGTERM, drain, terminationGracePeriod, request bleeding]
|
|
---
|
|
|
|
# Graceful Shutdown
|
|
|
|
> Deploy / scale-down 시 실행 중 request 잃지 않기. **SIGTERM → readiness off → drain in-flight → close DB → exit**. K8s `terminationGracePeriodSeconds`.
|
|
|
|
## 📖 핵심 개념
|
|
- SIGTERM: 종료 신호 (graceful).
|
|
- SIGKILL: 강제 — 30s 후 보통.
|
|
- Drain: 새 request 차단 + 기존 완료 대기.
|
|
- Readiness off: LB 가 traffic 안 보냄.
|
|
|
|
## 💻 코드 패턴
|
|
|
|
### 기본 (Express)
|
|
```ts
|
|
import { createServer } from 'node:http';
|
|
|
|
const server = createServer(app);
|
|
server.listen(3000);
|
|
|
|
let shuttingDown = false;
|
|
const inflight = new Set<Promise<void>>();
|
|
|
|
app.use((req, res, next) => {
|
|
if (shuttingDown) {
|
|
res.set('Connection', 'close');
|
|
return res.status(503).end('Shutting down');
|
|
}
|
|
next();
|
|
});
|
|
|
|
async function shutdown(signal: string) {
|
|
console.log(`Received ${signal}, shutting down`);
|
|
shuttingDown = true;
|
|
|
|
// 1. Stop accepting new requests
|
|
server.close((err) => {
|
|
if (err) console.error('Server close error', err);
|
|
});
|
|
|
|
// 2. Wait for in-flight (timeout 25s)
|
|
const timeout = setTimeout(() => {
|
|
console.error('Forced shutdown — in-flight not done');
|
|
process.exit(1);
|
|
}, 25000);
|
|
|
|
await Promise.allSettled(inflight);
|
|
clearTimeout(timeout);
|
|
|
|
// 3. Close DB / external connections
|
|
await db.$disconnect();
|
|
await redis.quit();
|
|
|
|
console.log('Bye');
|
|
process.exit(0);
|
|
}
|
|
|
|
process.on('SIGTERM', () => shutdown('SIGTERM'));
|
|
process.on('SIGINT', () => shutdown('SIGINT'));
|
|
```
|
|
|
|
### K8s pod lifecycle
|
|
```yaml
|
|
spec:
|
|
terminationGracePeriodSeconds: 30
|
|
containers:
|
|
- name: api
|
|
lifecycle:
|
|
preStop:
|
|
exec:
|
|
command: ["sh", "-c", "sleep 5"] # endpoint propagation 기다림
|
|
readinessProbe:
|
|
httpGet: { path: /readyz, port: 3000 }
|
|
periodSeconds: 5
|
|
```
|
|
|
|
```
|
|
K8s shutdown sequence:
|
|
1. Pod marked Terminating
|
|
2. preStop hook 실행 (5s sleep — LB 가 endpoint 제거 기다림)
|
|
3. SIGTERM 발송
|
|
4. App 가 graceful shutdown
|
|
5. terminationGracePeriodSeconds (30s) 후 SIGKILL
|
|
```
|
|
|
|
### Readiness off 패턴
|
|
```ts
|
|
app.get('/readyz', (req, res) => {
|
|
if (shuttingDown) return res.status(503).end();
|
|
res.status(200).end();
|
|
});
|
|
|
|
async function shutdown() {
|
|
shuttingDown = true;
|
|
|
|
// K8s 가 readiness fail 검출 → traffic 차단
|
|
await sleep(10000); // 다음 readiness probe + endpoint propagation
|
|
|
|
// 이제 새 traffic 안 옴 — 안전하게 종료
|
|
server.close();
|
|
// ...
|
|
}
|
|
```
|
|
|
|
→ preStop sleep + readiness off 둘 다 — 안전.
|
|
|
|
### Inflight 추적
|
|
```ts
|
|
let inflightCount = 0;
|
|
|
|
app.use((req, res, next) => {
|
|
inflightCount++;
|
|
res.on('finish', () => { inflightCount--; });
|
|
res.on('close', () => { inflightCount--; });
|
|
next();
|
|
});
|
|
|
|
async function waitForInflight(timeoutMs: number) {
|
|
const start = Date.now();
|
|
while (inflightCount > 0 && Date.now() - start < timeoutMs) {
|
|
await sleep(100);
|
|
}
|
|
if (inflightCount > 0) {
|
|
console.error(`${inflightCount} requests still in flight after ${timeoutMs}ms`);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Job worker — drain
|
|
```ts
|
|
let processing = false;
|
|
|
|
async function workerLoop() {
|
|
while (!shuttingDown) {
|
|
const job = await queue.fetchNext();
|
|
if (!job) {
|
|
await sleep(1000);
|
|
continue;
|
|
}
|
|
|
|
processing = true;
|
|
try {
|
|
await processJob(job);
|
|
await queue.ack(job);
|
|
} catch (e) {
|
|
await queue.nack(job, e);
|
|
} finally {
|
|
processing = false;
|
|
}
|
|
}
|
|
}
|
|
|
|
async function shutdown() {
|
|
shuttingDown = true;
|
|
|
|
// 현재 job 끝나기 기다림
|
|
while (processing) await sleep(100);
|
|
|
|
// 큐 connection 닫기
|
|
await queue.close();
|
|
}
|
|
```
|
|
|
|
### Connection drain (DB)
|
|
```ts
|
|
async function shutdown() {
|
|
// ...
|
|
|
|
// Pool 의 새 connection acquire 차단
|
|
await db.$disconnect();
|
|
// → 기존 connection commit 후 close
|
|
}
|
|
```
|
|
|
|
### WebSocket 연결 종료
|
|
```ts
|
|
async function shutdown() {
|
|
shuttingDown = true;
|
|
|
|
// 모든 client 에 알림
|
|
for (const ws of wss.clients) {
|
|
ws.send(JSON.stringify({ type: 'server.shutdown' }));
|
|
setTimeout(() => ws.close(1001, 'going away'), 5000);
|
|
}
|
|
|
|
// 또는 즉시
|
|
wss.close();
|
|
}
|
|
```
|
|
|
|
→ Client 가 reconnect (다른 instance 로).
|
|
|
|
### Long-running request
|
|
```ts
|
|
// 매우 긴 stream — graceful 어려움
|
|
// → 명시적 limit
|
|
app.post('/long', async (req, res) => {
|
|
const ac = new AbortController();
|
|
req.on('close', () => ac.abort());
|
|
|
|
// shutdown 시 abort
|
|
shutdownAbortController.signal.addEventListener('abort', () => ac.abort());
|
|
|
|
for await (const chunk of stream(ac.signal)) {
|
|
res.write(chunk);
|
|
}
|
|
});
|
|
```
|
|
|
|
### Fastify + onClose
|
|
```ts
|
|
import Fastify from 'fastify';
|
|
import gracefulShutdown from 'fastify-graceful-shutdown';
|
|
|
|
const app = Fastify();
|
|
app.register(gracefulShutdown);
|
|
|
|
app.gracefulShutdown((signal, next) => {
|
|
console.log('shutting down', signal);
|
|
next();
|
|
});
|
|
```
|
|
|
|
### Health check 구별
|
|
```
|
|
Liveness: 살아있나 (process)
|
|
Readiness: traffic 받을 수 있나 (state)
|
|
|
|
Shutdown:
|
|
- Liveness 는 OK (still running)
|
|
- Readiness 가 fail (shutting down)
|
|
→ Pod restart 안 됨, 단 LB 가 traffic 안 보냄.
|
|
```
|
|
|
|
### Test
|
|
```ts
|
|
test('graceful shutdown completes inflight', async () => {
|
|
const longReq = fetch('http://localhost:3000/slow');
|
|
await sleep(100); // request 시작
|
|
|
|
// SIGTERM 보내기
|
|
process.emit('SIGTERM');
|
|
|
|
// longReq 가 정상 완료
|
|
const r = await longReq;
|
|
expect(r.ok).toBe(true);
|
|
});
|
|
```
|
|
|
|
### Common gotchas
|
|
```
|
|
1. K8s endpoint propagation = ~5-10s. preStop sleep.
|
|
2. Connection: close header 안 보내면 keepalive — 다시 같은 conn.
|
|
3. Database 의 idle connection — pool drain.
|
|
4. Long-polling / SSE — explicit close.
|
|
5. Async after response — track.
|
|
```
|
|
|
|
### terminationGracePeriodSeconds
|
|
```
|
|
Default: 30s.
|
|
Long task: 60-120s.
|
|
Background job: 300s (5min).
|
|
|
|
→ App 가 이 시간 안 마무리 못 하면 SIGKILL.
|
|
```
|
|
|
|
### Forced shutdown
|
|
```ts
|
|
let forceTimer: NodeJS.Timeout;
|
|
|
|
async function shutdown() {
|
|
forceTimer = setTimeout(() => {
|
|
console.error('Force exit');
|
|
process.exit(1);
|
|
}, 25000); // K8s grace 보다 짧게
|
|
|
|
await graceful();
|
|
clearTimeout(forceTimer);
|
|
process.exit(0);
|
|
}
|
|
```
|
|
|
|
### Logging
|
|
```ts
|
|
log.info('shutdown.start', { signal });
|
|
log.info('shutdown.readiness-off');
|
|
log.info('shutdown.draining', { inflightCount });
|
|
log.info('shutdown.db-closed');
|
|
log.info('shutdown.complete');
|
|
```
|
|
|
|
## 🤔 의사결정 기준
|
|
| 작업 | 추천 |
|
|
|---|---|
|
|
| HTTP API | preStop sleep + readiness off + drain |
|
|
| Worker | Drain current job + close queue |
|
|
| WebSocket | Notify client + close |
|
|
| Long stream | Abort signal + close |
|
|
| Cron job | 완료 후 종료 |
|
|
| DB connection | Pool drain |
|
|
|
|
## ❌ 안티패턴
|
|
- **SIGTERM 무시**: K8s 가 SIGKILL — request 잃음.
|
|
- **Readiness 그대로**: traffic 계속 와서 새 request 처리.
|
|
- **PreStop 없음**: endpoint propagation 전 종료.
|
|
- **Force exit 즉시**: 진행 중 작업 잃음.
|
|
- **Inflight tracking 없음**: 언제 끝났는지 모름.
|
|
- **DB close 없음**: connection stuck.
|
|
- **Test 없음 prod 첫 시도**: 깨짐.
|
|
|
|
## 🤖 LLM 활용 힌트
|
|
- preStop sleep 5-10s + readiness off + drain inflight + DB close 4종.
|
|
- terminationGracePeriodSeconds = 30-60s 보통.
|
|
- Force timeout < grace period.
|
|
|
|
## 🔗 관련 문서
|
|
- [[Backend_Health_Check_Patterns]]
|
|
- [[DevOps_Kubernetes_Basics]]
|
|
- [[Backend_Service_Discovery]]
|