8.0 KiB
8.0 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| devops-argo-rollouts | Argo Rollouts — Canary / Blue-Green deploy | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Argo Rollouts
K8s Deployment 가 rolling 만 — 정밀 control X. Argo Rollouts: canary / blue-green / experiment. Auto rollback (metric 기반).
📖 핵심 개념
- Canary: 1% → 10% → 100%.
- Blue-green: 두 version, 한 번에 swap.
- Analysis: Prometheus / Datadog metric 기반 promote / abort.
- Service mesh + Argo = traffic shifting.
💻 코드 패턴
Rollout (Deployment 대신)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
template:
spec:
containers:
- name: app
image: myapp:v2
→ kubectl 가 자동 promote / rollback.
Manual promote
kubectl argo rollouts get rollout my-app
# → Visual progress
kubectl argo rollouts promote my-app
kubectl argo rollouts abort my-app
Pause + manual
strategy:
canary:
steps:
- setWeight: 10
- pause: {} # 무한 — manual promote 까지
- setWeight: 100
→ Production 첫 deploy = manual approve.
Blue-green
strategy:
blueGreen:
activeService: my-app-active
previewService: my-app-preview
autoPromotionEnabled: false
1. New ReplicaSet 만 (preview).
2. preview service 가 새 version.
3. Test / verify.
4. Promote = active service 가 새 version.
5. 옛 version 가 idle (rollback 가능).
Analysis (Prometheus)
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus.example.com
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
spec:
strategy:
canary:
steps:
- setWeight: 20
- pause: { duration: 1m }
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-app
- setWeight: 50
→ Success rate < 95% = abort (rollback).
Traffic management (Istio)
strategy:
canary:
canaryService: my-app-canary
stableService: my-app-stable
trafficRouting:
istio:
virtualServices:
- name: my-app-vsvc
destinationRule:
name: my-app-destrule
canarySubsetName: canary
stableSubsetName: stable
steps:
- setWeight: 5
- pause: { duration: 10m }
- setWeight: 25
→ Istio 가 weighted routing.
Header-based routing
steps:
- setHeaderRoute:
name: beta-route
match:
- headerName: X-Canary
headerValue:
exact: "true"
- pause: {} # beta 사용자 만 v2
- setWeight: 50
→ "Beta" header 가진 user 만 canary.
NGINX / ALB ingress
trafficRouting:
nginx:
stableIngress: my-app-stable-ingress
annotationPrefix: nginx.ingress.kubernetes.io
→ Service mesh 없이도.
Experiment (long-running A/B)
apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
name: my-experiment
spec:
duration: 1h
templates:
- name: baseline
replicas: 1
template: ...
- name: canary
replicas: 1
template: ...
analyses:
- name: success-rate
templateName: success-rate
→ 1 시간 실행, metric 비교.
Flagger (alternative)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
analysis:
interval: 1m
threshold: 5
iterations: 10
metrics:
- name: request-success-rate
thresholdRange: { min: 99 }
→ Flux / Helm 친화. Argo Rollouts 와 비슷.
Rollback
kubectl argo rollouts undo my-app
# 또는 spec 의 image 옛 version 으로 revert
→ 이전 ReplicaSet 가 active.
Auto-rollback (metric)
spec:
strategy:
canary:
steps:
- setWeight: 10
- analysis: { templates: [{templateName: error-rate}] }
# error-rate fail = automatic rollback
→ 사람 없이 도 안전.
Multiple analysis
analysis:
templates:
- templateName: success-rate
- templateName: latency-p99
- templateName: error-rate
→ 모두 pass = promote.
Web push (alarm)
metrics:
- name: success-rate
successCondition: result[0] >= 0.95
failureCondition: result[0] < 0.9
failureLimit: 3
inconclusiveLimit: 5 # rate 가 metric 모름 = inconclusive
→ 명시적 fail / inconclusive.
Web hook (외부 system)
metrics:
- name: web-test
provider:
web:
url: https://my-api.example.com/health
jsonPath: '{$.status}'
method: GET
successCondition: result == "healthy"
Notification (Slack)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
...
notifications:
onAbort:
- slack
onSuccess:
- slack
→ Promote / abort 시 Slack 알림.
GitOps (ArgoCD + Argo Rollouts)
1. Push new image tag to git.
2. ArgoCD sync = Rollout spec update.
3. Rollout 가 canary 시작.
4. Metric pass = promote.
5. Fail = auto rollback (git revert 안 함, K8s level).
→ 매 deploy 가 progressive.
Cost / overhead
- 매 canary 가 추가 replica (50% extra during rollout)
- Metric query 가 cluster cost
- Engineering 시간
→ 매 deploy 가 큰 risk = 가치.
Real-world
- Intuit (Argo 의 owner)
- Adobe: 큰 Argo 사용
- GitHub: 비슷한 internal
- Spotify: Flagger
- 모든 SaaS: progressive delivery 어떻든
When NOT?
- 작은 internal tool: rolling deploy 충분.
- Stateful: blue-green 어려움 (DB).
- Cron / batch job: canary 의미 X.
→ Critical path API / web 가 sweet spot.
Stateful 의 함정
DB schema 변경:
- v1 + v2 가 동시 = schema 가 둘 다 호환.
- Backward compatible migration 필수.
→ "expand-contract":
1. 새 column 추가 (v1 OK).
2. v2 가 새 column 사용.
3. v1 retire.
4. 옛 column 삭제.
Header-based testing
QA team 가 header 추가 → canary 만 사용.
"X-Canary: true" → v2 만 받음.
→ Production traffic 0% 의 진짜 canary.
LaunchDarkly + Argo
Feature flag (LD) + 점진 rollout (Argo).
- Argo: 새 version 의 traffic %.
- LD: 새 feature 의 user %.
→ 둘 다 layer.
🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 큰 traffic | Canary + analysis |
| Critical | Blue-green |
| Beta / A/B | Experiment |
| GitOps | ArgoCD + Rollouts |
| Flux | Flagger |
| Service mesh 있음 | Istio + Argo |
| 작은 system | Helm rolling |
❌ 안티패턴
- Auto-promote 만 + analysis 없음: 위험.
- 첫 deploy 가 100%: pause + manual.
- DB schema breaking + canary: data 깨짐.
- Metric query 가 too narrow: false signal.
- Manual promote 만: 사람 없이 안 됨.
- Rollback test 없음: 진짜 안 됨.
- Resource limit 없음: canary 가 cluster 죽임.
🤖 LLM 활용 힌트
- Canary + metric analysis 가 modern progressive.
- Blue-green 가 stateful 가 어려움.
- ArgoCD + Argo Rollouts 가 GitOps + delivery.
- Flagger 가 alternative.