145 lines
4.7 KiB
Markdown
145 lines
4.7 KiB
Markdown
---
|
|
id: wiki-2026-0508-backups
|
|
title: Backups
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Backup Strategy, Disaster Recovery, 백업]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [backup, dr, ops, sre]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: applied
|
|
tech_stack:
|
|
language: Bash/Python
|
|
framework: restic/borg/AWS Backup
|
|
---
|
|
|
|
# Backups
|
|
|
|
## 매 한 줄
|
|
> **"매 backup 은 restore 가 검증된 backup 만이다."**. Backups 는 매 3-2-1 rule (3 copies, 2 media, 1 offsite) + RTO/RPO target + 정기 restore drill 의 trio. 2026 의 standard: incremental dedup (restic/borg) + immutable object lock (S3 Object Lock, Azure Immutable Blob) + ransomware-resistant air gap.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 3-2-1-1-0 Rule (modern)
|
|
- **3** copies of data.
|
|
- **2** different media types.
|
|
- **1** offsite copy.
|
|
- **1** immutable / air-gapped (anti-ransomware, 매 2020+ 추가).
|
|
- **0** errors after restore verification.
|
|
|
|
### 매 RTO vs RPO
|
|
- **RTO (Recovery Time Objective)**: 매 outage 후 service 복구까지 허용 시간.
|
|
- **RPO (Recovery Point Objective)**: 매 허용 가능한 data loss window.
|
|
- 매 RTO=1h / RPO=15min 이면 hot standby 필요.
|
|
|
|
### 매 Backup Type
|
|
- **Full**: 매 전체 — slow, large, simple restore.
|
|
- **Incremental**: 매 since last backup — fast, smaller, restore chain.
|
|
- **Differential**: 매 since last full — middle ground.
|
|
- **Snapshot (CoW)**: 매 ZFS/btrfs/LVM/EBS — instant, space-efficient.
|
|
- **Continuous (CDC)**: 매 every transaction — Postgres WAL, MySQL binlog.
|
|
|
|
### 매 응용
|
|
1. DB backup (pg_basebackup + WAL archive).
|
|
2. File backup (restic, borg, Time Machine).
|
|
3. VM/disk snapshot (EBS, GCP PD, ZFS).
|
|
4. Object store replication (S3 CRR).
|
|
5. App-level (export-import, logical dump).
|
|
|
|
## 💻 패턴
|
|
|
|
### restic encrypted incremental backup
|
|
```bash
|
|
# 매 init repo (one-time)
|
|
restic init --repo s3:s3.amazonaws.com/my-backup-bucket
|
|
# 매 daily backup
|
|
restic -r s3:s3.amazonaws.com/my-backup-bucket backup /var/data \
|
|
--exclude '*.tmp' --tag daily --host $(hostname)
|
|
# 매 retention: keep 7d, 4w, 12m
|
|
restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 12 --prune
|
|
# 매 verify
|
|
restic check --read-data-subset=10%
|
|
```
|
|
|
|
### Postgres PITR setup
|
|
```bash
|
|
# postgresql.conf
|
|
wal_level = replica
|
|
archive_mode = on
|
|
archive_command = 'aws s3 cp %p s3://pg-wal/%f'
|
|
# 매 base backup
|
|
pg_basebackup -D /backup/base -Ft -z -P -U replicator
|
|
# 매 restore: recovery.conf or postgresql.auto.conf with restore_command + recovery_target_time
|
|
```
|
|
|
|
### S3 Object Lock (immutable, ransomware-proof)
|
|
```bash
|
|
aws s3api put-object-lock-configuration \
|
|
--bucket my-backup-bucket \
|
|
--object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":30}}}'
|
|
```
|
|
|
|
### Restore drill automation
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# 매 nightly drill — restore latest to scratch, verify checksums
|
|
set -euo pipefail
|
|
SCRATCH=$(mktemp -d)
|
|
restic -r s3:.../backup restore latest --target "$SCRATCH"
|
|
sha256sum -c expected_checksums.sha256 --strict
|
|
echo "drill ok: $(date -Iseconds)" | tee -a /var/log/restore-drill.log
|
|
rm -rf "$SCRATCH"
|
|
```
|
|
|
|
### ZFS snapshot + send
|
|
```bash
|
|
# 매 instant CoW snapshot
|
|
zfs snapshot tank/data@$(date +%Y%m%d-%H%M)
|
|
# 매 incremental send to remote
|
|
zfs send -i tank/data@yesterday tank/data@today | ssh backup-host zfs recv tank/data
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Files, small-mid | restic / borg |
|
|
| Postgres prod | pg_basebackup + WAL archive (PITR) |
|
|
| MySQL prod | xtrabackup + binlog |
|
|
| VM | snapshot + offsite replica |
|
|
| Multi-cloud | S3-compatible + CRR |
|
|
| Compliance (WORM) | S3 Object Lock COMPLIANCE mode |
|
|
|
|
**기본값**: 매 restic to S3 with Object Lock + nightly restore drill.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[SRE]]
|
|
- 변형: [[CI_CD_Pipeline]]
|
|
- 응용: [[카오스 몽키(Chaos Monkey)]]
|
|
- Adjacent: [[Secret_Management]] · [[Logging_and_Error_Handling]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: backup script generation, restore runbook drafting, log anomaly summarization.
|
|
**언제 X**: 매 actual restore execution — manual gate 필요.
|
|
|
|
## ❌ 안티패턴
|
|
- **No restore test**: 매 가장 흔한 실패 — backup 은 되는데 restore 가 안 됨.
|
|
- **Single copy**: 매 disk fail 한 방에 잃음.
|
|
- **No encryption**: 매 backup 이 attack vector — at-rest encrypt 필수.
|
|
- **No immutability**: 매 ransomware 가 backup 까지 암호화.
|
|
- **Forever retention**: 매 비용 폭발 + GDPR 위반 가능.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified: restic docs; AWS Backup whitepaper; Veeam 3-2-1-1-0 guide; PostgreSQL PITR docs.
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — 3-2-1-1-0 + restic/PG PITR/S3 Object Lock |
|