Files
2nd/10_Wiki/Topics/Coding/Backend_Maintenance_Mode.md
T
2026-05-09 21:08:02 +09:00

7.8 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
backend-maintenance-mode Maintenance Mode — 점진 / Read-only / Banner Coding draft B conceptual 2026-05-09 2026-05-09
backend
maintenance
vibe-coding
language applicable_to
TS
Backend
maintenance mode
read-only mode
downtime
planned outage
kill switch

Maintenance Mode

Migration / 큰 변경 = 일시 차단. Banner → read-only → full block 점진. 완전 down 보다 좋음. Kill switch + feature flag 통합.

📖 핵심 개념

  • Banner only: "Maintenance scheduled at X" 알림.
  • Read-only: GET OK, POST/PUT/DELETE 차단.
  • Restricted: admin 만 OK.
  • Full block: 503 + 모든 traffic.

💻 코드 패턴

Feature flag 기반

const MAINTENANCE_MODE = await flags.get('maintenance');
// 'off' | 'banner' | 'readonly' | 'admin-only' | 'full'

app.use(async (req, res, next) => {
  switch (MAINTENANCE_MODE) {
    case 'off':
      return next();
    case 'banner':
      res.setHeader('X-Maintenance-Banner', 'Scheduled at 2026-05-10 02:00 UTC');
      return next();
    case 'readonly':
      if (req.method !== 'GET' && req.method !== 'HEAD') {
        return res.status(503).json({
          type: '...',
          title: 'Read-only mode',
          detail: 'Writes are temporarily disabled',
          retryAfter: 1800,
        });
      }
      return next();
    case 'admin-only':
      if (!req.user?.isAdmin) {
        return res.status(503).json({
          type: '...', title: 'Maintenance', status: 503,
        });
      }
      return next();
    case 'full':
      return res.status(503).set('Retry-After', '1800').json({
        type: '...', title: 'Maintenance', status: 503,
      });
  }
});

Reverse proxy 차단 (nginx)

# Maintenance file 있으면 모두 503
server {
  if (-f /var/www/maintenance.html) {
    return 503;
  }
  
  error_page 503 /maintenance.html;
  
  location = /maintenance.html {
    root /var/www;
    internal;
  }
  
  # Admin IP allowlist
  location / {
    if ($remote_addr !~ ^(10\.0\.0\.1|10\.0\.0\.2)$) {
      if (-f /var/www/maintenance.html) {
        return 503;
      }
    }
    proxy_pass http://app;
  }
}
# Toggle
touch /var/www/maintenance.html  # ON
rm /var/www/maintenance.html     # OFF

CDN level (Cloudflare Worker)

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const mode = await env.KV.get('maintenance');
    
    if (mode === 'full' && !isAdminIp(req)) {
      return new Response('Maintenance', {
        status: 503,
        headers: { 'Retry-After': '1800', 'Content-Type': 'text/html' },
      });
    }
    
    return fetch(req);
  },
};

Banner UI

function App() {
  const { data: status } = useQuery(['maintenance'], fetchStatus);
  
  return (
    <>
      {status?.maintenance?.scheduled && (
        <div className="bg-yellow-100 border-b border-yellow-300 px-4 py-2 text-sm">
          ⚠️ Scheduled maintenance: {format(status.maintenance.start)} - {format(status.maintenance.end)}
        </div>
      )}
      {status?.maintenance?.readonly && (
        <div className="bg-orange-100 border-b border-orange-400 px-4 py-2 text-sm">
          🔒 Read-only mode active. Writes are temporarily disabled.
        </div>
      )}
      <Routes>...</Routes>
    </>
  );
}

DB migration with read-only

# 1. Read-only mode ON (writes 차단)
# 2. Wait for in-flight writes complete
# 3. Migration (큰 backfill, partition rebuild)
# 4. Verify
# 5. Read-only mode OFF
-- PG read-only role
CREATE ROLE readonly;
ALTER USER app_user SET default_transaction_read_only = on;

Kill switch (emergency)

// 외부 KV 또는 config 에서 제어
async function checkKillSwitch(feature: string): Promise<boolean> {
  return (await redis.get(`kill:${feature}`)) === '1';
}

app.post('/api/payments', async (req, res) => {
  if (await checkKillSwitch('payments')) {
    return res.status(503).json({
      title: 'Payments temporarily unavailable',
      detail: 'We are working to restore service. Try again in a few minutes.',
    });
  }
  // ...
});

→ Bug 발견 시 즉시 끄기. Deploy 안 기다림.

Status page

status.acme.com — 사용자에 표시.
- Statuspage.io / Better Stack / 자체.
- "Scheduled maintenance: 2026-05-10 02:00 UTC" 미리.

Communication (사용자)

1. Email (24h+ 전): 큰 maintenance.
2. Banner (web): 1h 전 + during.
3. API 응답 (header): 매번.
4. Status page: 항상.
5. Twitter / 사회 미디어: incident 시.

API 별 Retry-After

res.status(503).set({
  'Retry-After': '300',
  'X-Maintenance-Mode': 'true',
}).json({
  type: 'https://api.acme.com/errors/maintenance',
  title: 'Maintenance',
  detail: 'API temporarily unavailable, retry in 5 minutes',
  retryAfter: 300,
});

→ Client 가 자동 retry.

Soft launch (admin 만 보임)

// 새 feature 가 prod 배포됐지만 admin 만 사용 가능
if (newFeature.enabled) {
  if (!req.user?.isAdmin && !req.user?.betaTester) {
    return res.status(404).end();  // 사용자에는 없는 것처럼
  }
}

→ Stealth deploy + soft test.

Database maintenance

-- 큰 migration 시 lock 짧게
-- pg_repack, gh-ost 같은 zero-downtime 도구

-- 또는 read-only 로
ALTER DATABASE app SET default_transaction_read_only = on;
-- Migration 작업
ALTER DATABASE app SET default_transaction_read_only = off;

Rolling restart

# K8s
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

→ Pod 별 종료 + 새 pod 시작 — 서비스 안 끊김.

Runbook (사전 작성)

# Maintenance Runbook — DB Schema Migration v2

## Pre-checks
- [ ] Backup latest snapshot taken
- [ ] Migration tested on staging
- [ ] Rollback script ready
- [ ] Status page updated
- [ ] On-call notified

## Steps (estimated 30min)
1. Enable read-only mode at 02:00 UTC
2. Wait for write queue drain (5 min)
3. Run migration: `pnpm migrate:up`
4. Verify schema: `pnpm verify`
5. Disable read-only mode
6. Monitor errors for 30 min

## Rollback
1. Enable read-only mode
2. Run rollback: `pnpm migrate:down`
3. Disable read-only mode
4. Investigate

## Communication
- Status page: "Scheduled maintenance" 24h before
- Email: 24h before
- During: hourly status updates

Test maintenance mode

test('maintenance read-only blocks writes', async () => {
  await flags.set('maintenance', 'readonly');
  
  const r = await fetch('/api/orders', { method: 'POST', body: '...' });
  expect(r.status).toBe(503);
  
  const get = await fetch('/api/orders');
  expect(get.status).toBe(200);
  
  await flags.set('maintenance', 'off');
});

🤔 의사결정 기준

작업 Mode
Schema migration (안전) None — zero-downtime tools
Schema migration (위험) Read-only 5-30min
Major refactor Banner + monitor
Emergency bug Kill switch (specific feature)
Pricing change Banner only
DB hardware change Full maintenance window

안티패턴

  • Maintenance 갑자기 (사전 공지 X): 사용자 불만.
  • HTTP 200 + maintenance message: client retry 안 됨. 503 + Retry-After.
  • Admin / staff 도 차단: 디버깅 불가능.
  • Kill switch 없음: 큰 bug 시 deploy 기다림.
  • Banner 만 — 실제 차단 X: 사용자 시도 + 깨짐.
  • DB read-only + 일부 write 누락: 부분 깨짐.
  • Rollback plan 없음: Forward only — 실패 시 더 큰 사고.

🤖 LLM 활용 힌트

  • 점진 (banner → read-only → block).
  • Kill switch per feature.
  • Status page + 사용자 통신.
  • Runbook + rollback 미리.

🔗 관련 문서