7.8 KiB
7.8 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| backend-maintenance-mode | Maintenance Mode — 점진 / Read-only / Banner | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Maintenance Mode
Migration / 큰 변경 = 일시 차단. Banner → read-only → full block 점진. 완전 down 보다 좋음. Kill switch + feature flag 통합.
📖 핵심 개념
- Banner only: "Maintenance scheduled at X" 알림.
- Read-only: GET OK, POST/PUT/DELETE 차단.
- Restricted: admin 만 OK.
- Full block: 503 + 모든 traffic.
💻 코드 패턴
Feature flag 기반
const MAINTENANCE_MODE = await flags.get('maintenance');
// 'off' | 'banner' | 'readonly' | 'admin-only' | 'full'
app.use(async (req, res, next) => {
switch (MAINTENANCE_MODE) {
case 'off':
return next();
case 'banner':
res.setHeader('X-Maintenance-Banner', 'Scheduled at 2026-05-10 02:00 UTC');
return next();
case 'readonly':
if (req.method !== 'GET' && req.method !== 'HEAD') {
return res.status(503).json({
type: '...',
title: 'Read-only mode',
detail: 'Writes are temporarily disabled',
retryAfter: 1800,
});
}
return next();
case 'admin-only':
if (!req.user?.isAdmin) {
return res.status(503).json({
type: '...', title: 'Maintenance', status: 503,
});
}
return next();
case 'full':
return res.status(503).set('Retry-After', '1800').json({
type: '...', title: 'Maintenance', status: 503,
});
}
});
Reverse proxy 차단 (nginx)
# Maintenance file 있으면 모두 503
server {
if (-f /var/www/maintenance.html) {
return 503;
}
error_page 503 /maintenance.html;
location = /maintenance.html {
root /var/www;
internal;
}
# Admin IP allowlist
location / {
if ($remote_addr !~ ^(10\.0\.0\.1|10\.0\.0\.2)$) {
if (-f /var/www/maintenance.html) {
return 503;
}
}
proxy_pass http://app;
}
}
# Toggle
touch /var/www/maintenance.html # ON
rm /var/www/maintenance.html # OFF
CDN level (Cloudflare Worker)
export default {
async fetch(req: Request, env: Env): Promise<Response> {
const mode = await env.KV.get('maintenance');
if (mode === 'full' && !isAdminIp(req)) {
return new Response('Maintenance', {
status: 503,
headers: { 'Retry-After': '1800', 'Content-Type': 'text/html' },
});
}
return fetch(req);
},
};
Banner UI
function App() {
const { data: status } = useQuery(['maintenance'], fetchStatus);
return (
<>
{status?.maintenance?.scheduled && (
<div className="bg-yellow-100 border-b border-yellow-300 px-4 py-2 text-sm">
⚠️ Scheduled maintenance: {format(status.maintenance.start)} - {format(status.maintenance.end)}
</div>
)}
{status?.maintenance?.readonly && (
<div className="bg-orange-100 border-b border-orange-400 px-4 py-2 text-sm">
🔒 Read-only mode active. Writes are temporarily disabled.
</div>
)}
<Routes>...</Routes>
</>
);
}
DB migration with read-only
# 1. Read-only mode ON (writes 차단)
# 2. Wait for in-flight writes complete
# 3. Migration (큰 backfill, partition rebuild)
# 4. Verify
# 5. Read-only mode OFF
-- PG read-only role
CREATE ROLE readonly;
ALTER USER app_user SET default_transaction_read_only = on;
Kill switch (emergency)
// 외부 KV 또는 config 에서 제어
async function checkKillSwitch(feature: string): Promise<boolean> {
return (await redis.get(`kill:${feature}`)) === '1';
}
app.post('/api/payments', async (req, res) => {
if (await checkKillSwitch('payments')) {
return res.status(503).json({
title: 'Payments temporarily unavailable',
detail: 'We are working to restore service. Try again in a few minutes.',
});
}
// ...
});
→ Bug 발견 시 즉시 끄기. Deploy 안 기다림.
Status page
status.acme.com — 사용자에 표시.
- Statuspage.io / Better Stack / 자체.
- "Scheduled maintenance: 2026-05-10 02:00 UTC" 미리.
Communication (사용자)
1. Email (24h+ 전): 큰 maintenance.
2. Banner (web): 1h 전 + during.
3. API 응답 (header): 매번.
4. Status page: 항상.
5. Twitter / 사회 미디어: incident 시.
API 별 Retry-After
res.status(503).set({
'Retry-After': '300',
'X-Maintenance-Mode': 'true',
}).json({
type: 'https://api.acme.com/errors/maintenance',
title: 'Maintenance',
detail: 'API temporarily unavailable, retry in 5 minutes',
retryAfter: 300,
});
→ Client 가 자동 retry.
Soft launch (admin 만 보임)
// 새 feature 가 prod 배포됐지만 admin 만 사용 가능
if (newFeature.enabled) {
if (!req.user?.isAdmin && !req.user?.betaTester) {
return res.status(404).end(); // 사용자에는 없는 것처럼
}
}
→ Stealth deploy + soft test.
Database maintenance
-- 큰 migration 시 lock 짧게
-- pg_repack, gh-ost 같은 zero-downtime 도구
-- 또는 read-only 로
ALTER DATABASE app SET default_transaction_read_only = on;
-- Migration 작업
ALTER DATABASE app SET default_transaction_read_only = off;
Rolling restart
# K8s
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
→ Pod 별 종료 + 새 pod 시작 — 서비스 안 끊김.
Runbook (사전 작성)
# Maintenance Runbook — DB Schema Migration v2
## Pre-checks
- [ ] Backup latest snapshot taken
- [ ] Migration tested on staging
- [ ] Rollback script ready
- [ ] Status page updated
- [ ] On-call notified
## Steps (estimated 30min)
1. Enable read-only mode at 02:00 UTC
2. Wait for write queue drain (5 min)
3. Run migration: `pnpm migrate:up`
4. Verify schema: `pnpm verify`
5. Disable read-only mode
6. Monitor errors for 30 min
## Rollback
1. Enable read-only mode
2. Run rollback: `pnpm migrate:down`
3. Disable read-only mode
4. Investigate
## Communication
- Status page: "Scheduled maintenance" 24h before
- Email: 24h before
- During: hourly status updates
Test maintenance mode
test('maintenance read-only blocks writes', async () => {
await flags.set('maintenance', 'readonly');
const r = await fetch('/api/orders', { method: 'POST', body: '...' });
expect(r.status).toBe(503);
const get = await fetch('/api/orders');
expect(get.status).toBe(200);
await flags.set('maintenance', 'off');
});
🤔 의사결정 기준
| 작업 | Mode |
|---|---|
| Schema migration (안전) | None — zero-downtime tools |
| Schema migration (위험) | Read-only 5-30min |
| Major refactor | Banner + monitor |
| Emergency bug | Kill switch (specific feature) |
| Pricing change | Banner only |
| DB hardware change | Full maintenance window |
❌ 안티패턴
- Maintenance 갑자기 (사전 공지 X): 사용자 불만.
HTTP 200 + maintenance message: client retry 안 됨. 503 + Retry-After.- Admin / staff 도 차단: 디버깅 불가능.
- Kill switch 없음: 큰 bug 시 deploy 기다림.
- Banner 만 — 실제 차단 X: 사용자 시도 + 깨짐.
- DB read-only + 일부 write 누락: 부분 깨짐.
- Rollback plan 없음: Forward only — 실패 시 더 큰 사고.
🤖 LLM 활용 힌트
- 점진 (banner → read-only → block).
- Kill switch per feature.
- Status page + 사용자 통신.
- Runbook + rollback 미리.