Files
2nd/10_Wiki/Topics/Coding/Backend_Graceful_Shutdown.md
T
2026-05-09 21:08:02 +09:00

7.6 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
backend-graceful-shutdown Graceful Shutdown — Drain / SIGTERM / 작업 완료 Coding draft B conceptual 2026-05-09 2026-05-09
backend
shutdown
vibe-coding
language applicable_to
TS / Node
Backend
graceful shutdown
SIGTERM
drain
terminationGracePeriod
request bleeding

Graceful Shutdown

Deploy / scale-down 시 실행 중 request 잃지 않기. SIGTERM → readiness off → drain in-flight → close DB → exit. K8s terminationGracePeriodSeconds.

📖 핵심 개념

  • SIGTERM: 종료 신호 (graceful).
  • SIGKILL: 강제 — 30s 후 보통.
  • Drain: 새 request 차단 + 기존 완료 대기.
  • Readiness off: LB 가 traffic 안 보냄.

💻 코드 패턴

기본 (Express)

import { createServer } from 'node:http';

const server = createServer(app);
server.listen(3000);

let shuttingDown = false;
const inflight = new Set<Promise<void>>();

app.use((req, res, next) => {
  if (shuttingDown) {
    res.set('Connection', 'close');
    return res.status(503).end('Shutting down');
  }
  next();
});

async function shutdown(signal: string) {
  console.log(`Received ${signal}, shutting down`);
  shuttingDown = true;
  
  // 1. Stop accepting new requests
  server.close((err) => {
    if (err) console.error('Server close error', err);
  });
  
  // 2. Wait for in-flight (timeout 25s)
  const timeout = setTimeout(() => {
    console.error('Forced shutdown — in-flight not done');
    process.exit(1);
  }, 25000);
  
  await Promise.allSettled(inflight);
  clearTimeout(timeout);
  
  // 3. Close DB / external connections
  await db.$disconnect();
  await redis.quit();
  
  console.log('Bye');
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

K8s pod lifecycle

spec:
  terminationGracePeriodSeconds: 30
  containers:
    - name: api
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 5"]  # endpoint propagation 기다림
      readinessProbe:
        httpGet: { path: /readyz, port: 3000 }
        periodSeconds: 5
K8s shutdown sequence:
1. Pod marked Terminating
2. preStop hook 실행 (5s sleep — LB 가 endpoint 제거 기다림)
3. SIGTERM 발송
4. App 가 graceful shutdown
5. terminationGracePeriodSeconds (30s) 후 SIGKILL

Readiness off 패턴

app.get('/readyz', (req, res) => {
  if (shuttingDown) return res.status(503).end();
  res.status(200).end();
});

async function shutdown() {
  shuttingDown = true;
  
  // K8s 가 readiness fail 검출 → traffic 차단
  await sleep(10000);  // 다음 readiness probe + endpoint propagation
  
  // 이제 새 traffic 안 옴 — 안전하게 종료
  server.close();
  // ...
}

→ preStop sleep + readiness off 둘 다 — 안전.

Inflight 추적

let inflightCount = 0;

app.use((req, res, next) => {
  inflightCount++;
  res.on('finish', () => { inflightCount--; });
  res.on('close', () => { inflightCount--; });
  next();
});

async function waitForInflight(timeoutMs: number) {
  const start = Date.now();
  while (inflightCount > 0 && Date.now() - start < timeoutMs) {
    await sleep(100);
  }
  if (inflightCount > 0) {
    console.error(`${inflightCount} requests still in flight after ${timeoutMs}ms`);
  }
}

Job worker — drain

let processing = false;

async function workerLoop() {
  while (!shuttingDown) {
    const job = await queue.fetchNext();
    if (!job) {
      await sleep(1000);
      continue;
    }
    
    processing = true;
    try {
      await processJob(job);
      await queue.ack(job);
    } catch (e) {
      await queue.nack(job, e);
    } finally {
      processing = false;
    }
  }
}

async function shutdown() {
  shuttingDown = true;
  
  // 현재 job 끝나기 기다림
  while (processing) await sleep(100);
  
  // 큐 connection 닫기
  await queue.close();
}

Connection drain (DB)

async function shutdown() {
  // ...
  
  // Pool 의 새 connection acquire 차단
  await db.$disconnect();
  // → 기존 connection commit 후 close
}

WebSocket 연결 종료

async function shutdown() {
  shuttingDown = true;
  
  // 모든 client 에 알림
  for (const ws of wss.clients) {
    ws.send(JSON.stringify({ type: 'server.shutdown' }));
    setTimeout(() => ws.close(1001, 'going away'), 5000);
  }
  
  // 또는 즉시
  wss.close();
}

→ Client 가 reconnect (다른 instance 로).

Long-running request

// 매우 긴 stream — graceful 어려움
// → 명시적 limit
app.post('/long', async (req, res) => {
  const ac = new AbortController();
  req.on('close', () => ac.abort());
  
  // shutdown 시 abort
  shutdownAbortController.signal.addEventListener('abort', () => ac.abort());
  
  for await (const chunk of stream(ac.signal)) {
    res.write(chunk);
  }
});

Fastify + onClose

import Fastify from 'fastify';
import gracefulShutdown from 'fastify-graceful-shutdown';

const app = Fastify();
app.register(gracefulShutdown);

app.gracefulShutdown((signal, next) => {
  console.log('shutting down', signal);
  next();
});

Health check 구별

Liveness:   살아있나 (process)
Readiness:  traffic 받을 수 있나 (state)

Shutdown:
- Liveness 는 OK (still running)
- Readiness 가 fail (shutting down)
→ Pod restart 안 됨, 단 LB 가 traffic 안 보냄.

Test

test('graceful shutdown completes inflight', async () => {
  const longReq = fetch('http://localhost:3000/slow');
  await sleep(100);  // request 시작
  
  // SIGTERM 보내기
  process.emit('SIGTERM');
  
  // longReq 가 정상 완료
  const r = await longReq;
  expect(r.ok).toBe(true);
});

Common gotchas

1. K8s endpoint propagation = ~5-10s. preStop sleep.
2. Connection: close header 안 보내면 keepalive — 다시 같은 conn.
3. Database 의 idle connection — pool drain.
4. Long-polling / SSE — explicit close.
5. Async after response — track.

terminationGracePeriodSeconds

Default: 30s.
Long task: 60-120s.
Background job: 300s (5min).

→ App 가 이 시간 안 마무리 못 하면 SIGKILL.

Forced shutdown

let forceTimer: NodeJS.Timeout;

async function shutdown() {
  forceTimer = setTimeout(() => {
    console.error('Force exit');
    process.exit(1);
  }, 25000);  // K8s grace 보다 짧게
  
  await graceful();
  clearTimeout(forceTimer);
  process.exit(0);
}

Logging

log.info('shutdown.start', { signal });
log.info('shutdown.readiness-off');
log.info('shutdown.draining', { inflightCount });
log.info('shutdown.db-closed');
log.info('shutdown.complete');

🤔 의사결정 기준

작업 추천
HTTP API preStop sleep + readiness off + drain
Worker Drain current job + close queue
WebSocket Notify client + close
Long stream Abort signal + close
Cron job 완료 후 종료
DB connection Pool drain

안티패턴

  • SIGTERM 무시: K8s 가 SIGKILL — request 잃음.
  • Readiness 그대로: traffic 계속 와서 새 request 처리.
  • PreStop 없음: endpoint propagation 전 종료.
  • Force exit 즉시: 진행 중 작업 잃음.
  • Inflight tracking 없음: 언제 끝났는지 모름.
  • DB close 없음: connection stuck.
  • Test 없음 prod 첫 시도: 깨짐.

🤖 LLM 활용 힌트

  • preStop sleep 5-10s + readiness off + drain inflight + DB close 4종.
  • terminationGracePeriodSeconds = 30-60s 보통.
  • Force timeout < grace period.

🔗 관련 문서