391 lines
8.0 KiB
Markdown
391 lines
8.0 KiB
Markdown
---
|
|
id: devops-argo-rollouts
|
|
title: Argo Rollouts — Canary / Blue-Green deploy
|
|
category: Coding
|
|
status: draft
|
|
source_trust_level: B
|
|
verification_status: conceptual
|
|
created_at: 2026-05-09
|
|
updated_at: 2026-05-09
|
|
tags: [devops, deployment, vibe-coding]
|
|
tech_stack: { language: "YAML", applicable_to: ["DevOps"] }
|
|
applied_in: []
|
|
aliases: [Argo Rollouts, canary, blue-green, progressive delivery, Flagger, AnalysisRun]
|
|
---
|
|
|
|
# Argo Rollouts
|
|
|
|
> K8s Deployment 가 rolling 만 — 정밀 control X. **Argo Rollouts: canary / blue-green / experiment**. Auto rollback (metric 기반).
|
|
|
|
## 📖 핵심 개념
|
|
- Canary: 1% → 10% → 100%.
|
|
- Blue-green: 두 version, 한 번에 swap.
|
|
- Analysis: Prometheus / Datadog metric 기반 promote / abort.
|
|
- Service mesh + Argo = traffic shifting.
|
|
|
|
## 💻 코드 패턴
|
|
|
|
### Rollout (Deployment 대신)
|
|
```yaml
|
|
apiVersion: argoproj.io/v1alpha1
|
|
kind: Rollout
|
|
metadata:
|
|
name: my-app
|
|
spec:
|
|
replicas: 5
|
|
strategy:
|
|
canary:
|
|
steps:
|
|
- setWeight: 20
|
|
- pause: { duration: 5m }
|
|
- setWeight: 50
|
|
- pause: { duration: 10m }
|
|
- setWeight: 100
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: app
|
|
image: myapp:v2
|
|
```
|
|
|
|
→ kubectl 가 자동 promote / rollback.
|
|
|
|
### Manual promote
|
|
```bash
|
|
kubectl argo rollouts get rollout my-app
|
|
# → Visual progress
|
|
|
|
kubectl argo rollouts promote my-app
|
|
kubectl argo rollouts abort my-app
|
|
```
|
|
|
|
### Pause + manual
|
|
```yaml
|
|
strategy:
|
|
canary:
|
|
steps:
|
|
- setWeight: 10
|
|
- pause: {} # 무한 — manual promote 까지
|
|
- setWeight: 100
|
|
```
|
|
|
|
→ Production 첫 deploy = manual approve.
|
|
|
|
### Blue-green
|
|
```yaml
|
|
strategy:
|
|
blueGreen:
|
|
activeService: my-app-active
|
|
previewService: my-app-preview
|
|
autoPromotionEnabled: false
|
|
```
|
|
|
|
```
|
|
1. New ReplicaSet 만 (preview).
|
|
2. preview service 가 새 version.
|
|
3. Test / verify.
|
|
4. Promote = active service 가 새 version.
|
|
5. 옛 version 가 idle (rollback 가능).
|
|
```
|
|
|
|
### Analysis (Prometheus)
|
|
```yaml
|
|
apiVersion: argoproj.io/v1alpha1
|
|
kind: AnalysisTemplate
|
|
metadata:
|
|
name: success-rate
|
|
spec:
|
|
args:
|
|
- name: service-name
|
|
metrics:
|
|
- name: success-rate
|
|
interval: 1m
|
|
successCondition: result[0] >= 0.95
|
|
failureLimit: 3
|
|
provider:
|
|
prometheus:
|
|
address: http://prometheus.example.com
|
|
query: |
|
|
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
|
|
/
|
|
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
|
|
```
|
|
|
|
```yaml
|
|
spec:
|
|
strategy:
|
|
canary:
|
|
steps:
|
|
- setWeight: 20
|
|
- pause: { duration: 1m }
|
|
- analysis:
|
|
templates:
|
|
- templateName: success-rate
|
|
args:
|
|
- name: service-name
|
|
value: my-app
|
|
- setWeight: 50
|
|
```
|
|
|
|
→ Success rate < 95% = abort (rollback).
|
|
|
|
### Traffic management (Istio)
|
|
```yaml
|
|
strategy:
|
|
canary:
|
|
canaryService: my-app-canary
|
|
stableService: my-app-stable
|
|
trafficRouting:
|
|
istio:
|
|
virtualServices:
|
|
- name: my-app-vsvc
|
|
destinationRule:
|
|
name: my-app-destrule
|
|
canarySubsetName: canary
|
|
stableSubsetName: stable
|
|
steps:
|
|
- setWeight: 5
|
|
- pause: { duration: 10m }
|
|
- setWeight: 25
|
|
```
|
|
|
|
→ Istio 가 weighted routing.
|
|
|
|
### Header-based routing
|
|
```yaml
|
|
steps:
|
|
- setHeaderRoute:
|
|
name: beta-route
|
|
match:
|
|
- headerName: X-Canary
|
|
headerValue:
|
|
exact: "true"
|
|
- pause: {} # beta 사용자 만 v2
|
|
- setWeight: 50
|
|
```
|
|
|
|
→ "Beta" header 가진 user 만 canary.
|
|
|
|
### NGINX / ALB ingress
|
|
```yaml
|
|
trafficRouting:
|
|
nginx:
|
|
stableIngress: my-app-stable-ingress
|
|
annotationPrefix: nginx.ingress.kubernetes.io
|
|
```
|
|
|
|
→ Service mesh 없이도.
|
|
|
|
### Experiment (long-running A/B)
|
|
```yaml
|
|
apiVersion: argoproj.io/v1alpha1
|
|
kind: Experiment
|
|
metadata:
|
|
name: my-experiment
|
|
spec:
|
|
duration: 1h
|
|
templates:
|
|
- name: baseline
|
|
replicas: 1
|
|
template: ...
|
|
- name: canary
|
|
replicas: 1
|
|
template: ...
|
|
analyses:
|
|
- name: success-rate
|
|
templateName: success-rate
|
|
```
|
|
|
|
→ 1 시간 실행, metric 비교.
|
|
|
|
### Flagger (alternative)
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: my-app
|
|
spec:
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: my-app
|
|
analysis:
|
|
interval: 1m
|
|
threshold: 5
|
|
iterations: 10
|
|
metrics:
|
|
- name: request-success-rate
|
|
thresholdRange: { min: 99 }
|
|
```
|
|
|
|
→ Flux / Helm 친화. Argo Rollouts 와 비슷.
|
|
|
|
### Rollback
|
|
```bash
|
|
kubectl argo rollouts undo my-app
|
|
|
|
# 또는 spec 의 image 옛 version 으로 revert
|
|
```
|
|
|
|
→ 이전 ReplicaSet 가 active.
|
|
|
|
### Auto-rollback (metric)
|
|
```yaml
|
|
spec:
|
|
strategy:
|
|
canary:
|
|
steps:
|
|
- setWeight: 10
|
|
- analysis: { templates: [{templateName: error-rate}] }
|
|
# error-rate fail = automatic rollback
|
|
```
|
|
|
|
→ 사람 없이 도 안전.
|
|
|
|
### Multiple analysis
|
|
```yaml
|
|
analysis:
|
|
templates:
|
|
- templateName: success-rate
|
|
- templateName: latency-p99
|
|
- templateName: error-rate
|
|
```
|
|
|
|
→ 모두 pass = promote.
|
|
|
|
### Web push (alarm)
|
|
```yaml
|
|
metrics:
|
|
- name: success-rate
|
|
successCondition: result[0] >= 0.95
|
|
failureCondition: result[0] < 0.9
|
|
failureLimit: 3
|
|
inconclusiveLimit: 5 # rate 가 metric 모름 = inconclusive
|
|
```
|
|
|
|
→ 명시적 fail / inconclusive.
|
|
|
|
### Web hook (외부 system)
|
|
```yaml
|
|
metrics:
|
|
- name: web-test
|
|
provider:
|
|
web:
|
|
url: https://my-api.example.com/health
|
|
jsonPath: '{$.status}'
|
|
method: GET
|
|
successCondition: result == "healthy"
|
|
```
|
|
|
|
### Notification (Slack)
|
|
```yaml
|
|
apiVersion: argoproj.io/v1alpha1
|
|
kind: Rollout
|
|
spec:
|
|
...
|
|
notifications:
|
|
onAbort:
|
|
- slack
|
|
onSuccess:
|
|
- slack
|
|
```
|
|
|
|
→ Promote / abort 시 Slack 알림.
|
|
|
|
### GitOps (ArgoCD + Argo Rollouts)
|
|
```
|
|
1. Push new image tag to git.
|
|
2. ArgoCD sync = Rollout spec update.
|
|
3. Rollout 가 canary 시작.
|
|
4. Metric pass = promote.
|
|
5. Fail = auto rollback (git revert 안 함, K8s level).
|
|
```
|
|
|
|
→ 매 deploy 가 progressive.
|
|
|
|
### Cost / overhead
|
|
```
|
|
- 매 canary 가 추가 replica (50% extra during rollout)
|
|
- Metric query 가 cluster cost
|
|
- Engineering 시간
|
|
|
|
→ 매 deploy 가 큰 risk = 가치.
|
|
```
|
|
|
|
### Real-world
|
|
- **Intuit** (Argo 의 owner)
|
|
- **Adobe**: 큰 Argo 사용
|
|
- **GitHub**: 비슷한 internal
|
|
- **Spotify**: Flagger
|
|
- **모든 SaaS**: progressive delivery 어떻든
|
|
|
|
### When NOT?
|
|
```
|
|
- 작은 internal tool: rolling deploy 충분.
|
|
- Stateful: blue-green 어려움 (DB).
|
|
- Cron / batch job: canary 의미 X.
|
|
|
|
→ Critical path API / web 가 sweet spot.
|
|
```
|
|
|
|
### Stateful 의 함정
|
|
```
|
|
DB schema 변경:
|
|
- v1 + v2 가 동시 = schema 가 둘 다 호환.
|
|
- Backward compatible migration 필수.
|
|
|
|
→ "expand-contract":
|
|
1. 새 column 추가 (v1 OK).
|
|
2. v2 가 새 column 사용.
|
|
3. v1 retire.
|
|
4. 옛 column 삭제.
|
|
```
|
|
|
|
### Header-based testing
|
|
```
|
|
QA team 가 header 추가 → canary 만 사용.
|
|
"X-Canary: true" → v2 만 받음.
|
|
|
|
→ Production traffic 0% 의 진짜 canary.
|
|
```
|
|
|
|
### LaunchDarkly + Argo
|
|
```
|
|
Feature flag (LD) + 점진 rollout (Argo).
|
|
- Argo: 새 version 의 traffic %.
|
|
- LD: 새 feature 의 user %.
|
|
|
|
→ 둘 다 layer.
|
|
```
|
|
|
|
## 🤔 의사결정 기준
|
|
| 상황 | 추천 |
|
|
|---|---|
|
|
| 큰 traffic | Canary + analysis |
|
|
| Critical | Blue-green |
|
|
| Beta / A/B | Experiment |
|
|
| GitOps | ArgoCD + Rollouts |
|
|
| Flux | Flagger |
|
|
| Service mesh 있음 | Istio + Argo |
|
|
| 작은 system | Helm rolling |
|
|
|
|
## ❌ 안티패턴
|
|
- **Auto-promote 만 + analysis 없음**: 위험.
|
|
- **첫 deploy 가 100%**: pause + manual.
|
|
- **DB schema breaking + canary**: data 깨짐.
|
|
- **Metric query 가 too narrow**: false signal.
|
|
- **Manual promote 만**: 사람 없이 안 됨.
|
|
- **Rollback test 없음**: 진짜 안 됨.
|
|
- **Resource limit 없음**: canary 가 cluster 죽임.
|
|
|
|
## 🤖 LLM 활용 힌트
|
|
- Canary + metric analysis 가 modern progressive.
|
|
- Blue-green 가 stateful 가 어려움.
|
|
- ArgoCD + Argo Rollouts 가 GitOps + delivery.
|
|
- Flagger 가 alternative.
|
|
|
|
## 🔗 관련 문서
|
|
- [[DevOps_Deployment_Strategies]]
|
|
- [[DevOps_ArgoCD_GitOps]]
|
|
- [[DevOps_Service_Mesh_Deep]]
|