[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,390 @@
|
||||
---
|
||||
id: devops-argo-rollouts
|
||||
title: Argo Rollouts — Canary / Blue-Green deploy
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [devops, deployment, vibe-coding]
|
||||
tech_stack: { language: "YAML", applicable_to: ["DevOps"] }
|
||||
applied_in: []
|
||||
aliases: [Argo Rollouts, canary, blue-green, progressive delivery, Flagger, AnalysisRun]
|
||||
---
|
||||
|
||||
# Argo Rollouts
|
||||
|
||||
> K8s Deployment 가 rolling 만 — 정밀 control X. **Argo Rollouts: canary / blue-green / experiment**. Auto rollback (metric 기반).
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Canary: 1% → 10% → 100%.
|
||||
- Blue-green: 두 version, 한 번에 swap.
|
||||
- Analysis: Prometheus / Datadog metric 기반 promote / abort.
|
||||
- Service mesh + Argo = traffic shifting.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### Rollout (Deployment 대신)
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Rollout
|
||||
metadata:
|
||||
name: my-app
|
||||
spec:
|
||||
replicas: 5
|
||||
strategy:
|
||||
canary:
|
||||
steps:
|
||||
- setWeight: 20
|
||||
- pause: { duration: 5m }
|
||||
- setWeight: 50
|
||||
- pause: { duration: 10m }
|
||||
- setWeight: 100
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: myapp:v2
|
||||
```
|
||||
|
||||
→ kubectl 가 자동 promote / rollback.
|
||||
|
||||
### Manual promote
|
||||
```bash
|
||||
kubectl argo rollouts get rollout my-app
|
||||
# → Visual progress
|
||||
|
||||
kubectl argo rollouts promote my-app
|
||||
kubectl argo rollouts abort my-app
|
||||
```
|
||||
|
||||
### Pause + manual
|
||||
```yaml
|
||||
strategy:
|
||||
canary:
|
||||
steps:
|
||||
- setWeight: 10
|
||||
- pause: {} # 무한 — manual promote 까지
|
||||
- setWeight: 100
|
||||
```
|
||||
|
||||
→ Production 첫 deploy = manual approve.
|
||||
|
||||
### Blue-green
|
||||
```yaml
|
||||
strategy:
|
||||
blueGreen:
|
||||
activeService: my-app-active
|
||||
previewService: my-app-preview
|
||||
autoPromotionEnabled: false
|
||||
```
|
||||
|
||||
```
|
||||
1. New ReplicaSet 만 (preview).
|
||||
2. preview service 가 새 version.
|
||||
3. Test / verify.
|
||||
4. Promote = active service 가 새 version.
|
||||
5. 옛 version 가 idle (rollback 가능).
|
||||
```
|
||||
|
||||
### Analysis (Prometheus)
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: AnalysisTemplate
|
||||
metadata:
|
||||
name: success-rate
|
||||
spec:
|
||||
args:
|
||||
- name: service-name
|
||||
metrics:
|
||||
- name: success-rate
|
||||
interval: 1m
|
||||
successCondition: result[0] >= 0.95
|
||||
failureLimit: 3
|
||||
provider:
|
||||
prometheus:
|
||||
address: http://prometheus.example.com
|
||||
query: |
|
||||
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
|
||||
/
|
||||
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
|
||||
```
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
strategy:
|
||||
canary:
|
||||
steps:
|
||||
- setWeight: 20
|
||||
- pause: { duration: 1m }
|
||||
- analysis:
|
||||
templates:
|
||||
- templateName: success-rate
|
||||
args:
|
||||
- name: service-name
|
||||
value: my-app
|
||||
- setWeight: 50
|
||||
```
|
||||
|
||||
→ Success rate < 95% = abort (rollback).
|
||||
|
||||
### Traffic management (Istio)
|
||||
```yaml
|
||||
strategy:
|
||||
canary:
|
||||
canaryService: my-app-canary
|
||||
stableService: my-app-stable
|
||||
trafficRouting:
|
||||
istio:
|
||||
virtualServices:
|
||||
- name: my-app-vsvc
|
||||
destinationRule:
|
||||
name: my-app-destrule
|
||||
canarySubsetName: canary
|
||||
stableSubsetName: stable
|
||||
steps:
|
||||
- setWeight: 5
|
||||
- pause: { duration: 10m }
|
||||
- setWeight: 25
|
||||
```
|
||||
|
||||
→ Istio 가 weighted routing.
|
||||
|
||||
### Header-based routing
|
||||
```yaml
|
||||
steps:
|
||||
- setHeaderRoute:
|
||||
name: beta-route
|
||||
match:
|
||||
- headerName: X-Canary
|
||||
headerValue:
|
||||
exact: "true"
|
||||
- pause: {} # beta 사용자 만 v2
|
||||
- setWeight: 50
|
||||
```
|
||||
|
||||
→ "Beta" header 가진 user 만 canary.
|
||||
|
||||
### NGINX / ALB ingress
|
||||
```yaml
|
||||
trafficRouting:
|
||||
nginx:
|
||||
stableIngress: my-app-stable-ingress
|
||||
annotationPrefix: nginx.ingress.kubernetes.io
|
||||
```
|
||||
|
||||
→ Service mesh 없이도.
|
||||
|
||||
### Experiment (long-running A/B)
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Experiment
|
||||
metadata:
|
||||
name: my-experiment
|
||||
spec:
|
||||
duration: 1h
|
||||
templates:
|
||||
- name: baseline
|
||||
replicas: 1
|
||||
template: ...
|
||||
- name: canary
|
||||
replicas: 1
|
||||
template: ...
|
||||
analyses:
|
||||
- name: success-rate
|
||||
templateName: success-rate
|
||||
```
|
||||
|
||||
→ 1 시간 실행, metric 비교.
|
||||
|
||||
### Flagger (alternative)
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
kind: Canary
|
||||
metadata:
|
||||
name: my-app
|
||||
spec:
|
||||
targetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: my-app
|
||||
analysis:
|
||||
interval: 1m
|
||||
threshold: 5
|
||||
iterations: 10
|
||||
metrics:
|
||||
- name: request-success-rate
|
||||
thresholdRange: { min: 99 }
|
||||
```
|
||||
|
||||
→ Flux / Helm 친화. Argo Rollouts 와 비슷.
|
||||
|
||||
### Rollback
|
||||
```bash
|
||||
kubectl argo rollouts undo my-app
|
||||
|
||||
# 또는 spec 의 image 옛 version 으로 revert
|
||||
```
|
||||
|
||||
→ 이전 ReplicaSet 가 active.
|
||||
|
||||
### Auto-rollback (metric)
|
||||
```yaml
|
||||
spec:
|
||||
strategy:
|
||||
canary:
|
||||
steps:
|
||||
- setWeight: 10
|
||||
- analysis: { templates: [{templateName: error-rate}] }
|
||||
# error-rate fail = automatic rollback
|
||||
```
|
||||
|
||||
→ 사람 없이 도 안전.
|
||||
|
||||
### Multiple analysis
|
||||
```yaml
|
||||
analysis:
|
||||
templates:
|
||||
- templateName: success-rate
|
||||
- templateName: latency-p99
|
||||
- templateName: error-rate
|
||||
```
|
||||
|
||||
→ 모두 pass = promote.
|
||||
|
||||
### Web push (alarm)
|
||||
```yaml
|
||||
metrics:
|
||||
- name: success-rate
|
||||
successCondition: result[0] >= 0.95
|
||||
failureCondition: result[0] < 0.9
|
||||
failureLimit: 3
|
||||
inconclusiveLimit: 5 # rate 가 metric 모름 = inconclusive
|
||||
```
|
||||
|
||||
→ 명시적 fail / inconclusive.
|
||||
|
||||
### Web hook (외부 system)
|
||||
```yaml
|
||||
metrics:
|
||||
- name: web-test
|
||||
provider:
|
||||
web:
|
||||
url: https://my-api.example.com/health
|
||||
jsonPath: '{$.status}'
|
||||
method: GET
|
||||
successCondition: result == "healthy"
|
||||
```
|
||||
|
||||
### Notification (Slack)
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Rollout
|
||||
spec:
|
||||
...
|
||||
notifications:
|
||||
onAbort:
|
||||
- slack
|
||||
onSuccess:
|
||||
- slack
|
||||
```
|
||||
|
||||
→ Promote / abort 시 Slack 알림.
|
||||
|
||||
### GitOps (ArgoCD + Argo Rollouts)
|
||||
```
|
||||
1. Push new image tag to git.
|
||||
2. ArgoCD sync = Rollout spec update.
|
||||
3. Rollout 가 canary 시작.
|
||||
4. Metric pass = promote.
|
||||
5. Fail = auto rollback (git revert 안 함, K8s level).
|
||||
```
|
||||
|
||||
→ 매 deploy 가 progressive.
|
||||
|
||||
### Cost / overhead
|
||||
```
|
||||
- 매 canary 가 추가 replica (50% extra during rollout)
|
||||
- Metric query 가 cluster cost
|
||||
- Engineering 시간
|
||||
|
||||
→ 매 deploy 가 큰 risk = 가치.
|
||||
```
|
||||
|
||||
### Real-world
|
||||
- **Intuit** (Argo 의 owner)
|
||||
- **Adobe**: 큰 Argo 사용
|
||||
- **GitHub**: 비슷한 internal
|
||||
- **Spotify**: Flagger
|
||||
- **모든 SaaS**: progressive delivery 어떻든
|
||||
|
||||
### When NOT?
|
||||
```
|
||||
- 작은 internal tool: rolling deploy 충분.
|
||||
- Stateful: blue-green 어려움 (DB).
|
||||
- Cron / batch job: canary 의미 X.
|
||||
|
||||
→ Critical path API / web 가 sweet spot.
|
||||
```
|
||||
|
||||
### Stateful 의 함정
|
||||
```
|
||||
DB schema 변경:
|
||||
- v1 + v2 가 동시 = schema 가 둘 다 호환.
|
||||
- Backward compatible migration 필수.
|
||||
|
||||
→ "expand-contract":
|
||||
1. 새 column 추가 (v1 OK).
|
||||
2. v2 가 새 column 사용.
|
||||
3. v1 retire.
|
||||
4. 옛 column 삭제.
|
||||
```
|
||||
|
||||
### Header-based testing
|
||||
```
|
||||
QA team 가 header 추가 → canary 만 사용.
|
||||
"X-Canary: true" → v2 만 받음.
|
||||
|
||||
→ Production traffic 0% 의 진짜 canary.
|
||||
```
|
||||
|
||||
### LaunchDarkly + Argo
|
||||
```
|
||||
Feature flag (LD) + 점진 rollout (Argo).
|
||||
- Argo: 새 version 의 traffic %.
|
||||
- LD: 새 feature 의 user %.
|
||||
|
||||
→ 둘 다 layer.
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 상황 | 추천 |
|
||||
|---|---|
|
||||
| 큰 traffic | Canary + analysis |
|
||||
| Critical | Blue-green |
|
||||
| Beta / A/B | Experiment |
|
||||
| GitOps | ArgoCD + Rollouts |
|
||||
| Flux | Flagger |
|
||||
| Service mesh 있음 | Istio + Argo |
|
||||
| 작은 system | Helm rolling |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **Auto-promote 만 + analysis 없음**: 위험.
|
||||
- **첫 deploy 가 100%**: pause + manual.
|
||||
- **DB schema breaking + canary**: data 깨짐.
|
||||
- **Metric query 가 too narrow**: false signal.
|
||||
- **Manual promote 만**: 사람 없이 안 됨.
|
||||
- **Rollback test 없음**: 진짜 안 됨.
|
||||
- **Resource limit 없음**: canary 가 cluster 죽임.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- Canary + metric analysis 가 modern progressive.
|
||||
- Blue-green 가 stateful 가 어려움.
|
||||
- ArgoCD + Argo Rollouts 가 GitOps + delivery.
|
||||
- Flagger 가 alternative.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DevOps_Deployment_Strategies]]
|
||||
- [[DevOps_ArgoCD_GitOps]]
|
||||
- [[DevOps_Service_Mesh_Deep]]
|
||||
Reference in New Issue
Block a user