--- id: devops-argo-rollouts title: Argo Rollouts — Canary / Blue-Green deploy category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [devops, deployment, vibe-coding] tech_stack: { language: "YAML", applicable_to: ["DevOps"] } applied_in: [] aliases: [Argo Rollouts, canary, blue-green, progressive delivery, Flagger, AnalysisRun] --- # Argo Rollouts > K8s Deployment 가 rolling 만 — 정밀 control X. **Argo Rollouts: canary / blue-green / experiment**. Auto rollback (metric 기반). ## 📖 핵심 개념 - Canary: 1% → 10% → 100%. - Blue-green: 두 version, 한 번에 swap. - Analysis: Prometheus / Datadog metric 기반 promote / abort. - Service mesh + Argo = traffic shifting. ## 💻 코드 패턴 ### Rollout (Deployment 대신) ```yaml apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: my-app spec: replicas: 5 strategy: canary: steps: - setWeight: 20 - pause: { duration: 5m } - setWeight: 50 - pause: { duration: 10m } - setWeight: 100 template: spec: containers: - name: app image: myapp:v2 ``` → kubectl 가 자동 promote / rollback. ### Manual promote ```bash kubectl argo rollouts get rollout my-app # → Visual progress kubectl argo rollouts promote my-app kubectl argo rollouts abort my-app ``` ### Pause + manual ```yaml strategy: canary: steps: - setWeight: 10 - pause: {} # 무한 — manual promote 까지 - setWeight: 100 ``` → Production 첫 deploy = manual approve. ### Blue-green ```yaml strategy: blueGreen: activeService: my-app-active previewService: my-app-preview autoPromotionEnabled: false ``` ``` 1. New ReplicaSet 만 (preview). 2. preview service 가 새 version. 3. Test / verify. 4. Promote = active service 가 새 version. 5. 옛 version 가 idle (rollback 가능). ``` ### Analysis (Prometheus) ```yaml apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: success-rate spec: args: - name: service-name metrics: - name: success-rate interval: 1m successCondition: result[0] >= 0.95 failureLimit: 3 provider: prometheus: address: http://prometheus.example.com query: | sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m])) / sum(rate(http_requests_total{service="{{args.service-name}}"}[5m])) ``` ```yaml spec: strategy: canary: steps: - setWeight: 20 - pause: { duration: 1m } - analysis: templates: - templateName: success-rate args: - name: service-name value: my-app - setWeight: 50 ``` → Success rate < 95% = abort (rollback). ### Traffic management (Istio) ```yaml strategy: canary: canaryService: my-app-canary stableService: my-app-stable trafficRouting: istio: virtualServices: - name: my-app-vsvc destinationRule: name: my-app-destrule canarySubsetName: canary stableSubsetName: stable steps: - setWeight: 5 - pause: { duration: 10m } - setWeight: 25 ``` → Istio 가 weighted routing. ### Header-based routing ```yaml steps: - setHeaderRoute: name: beta-route match: - headerName: X-Canary headerValue: exact: "true" - pause: {} # beta 사용자 만 v2 - setWeight: 50 ``` → "Beta" header 가진 user 만 canary. ### NGINX / ALB ingress ```yaml trafficRouting: nginx: stableIngress: my-app-stable-ingress annotationPrefix: nginx.ingress.kubernetes.io ``` → Service mesh 없이도. ### Experiment (long-running A/B) ```yaml apiVersion: argoproj.io/v1alpha1 kind: Experiment metadata: name: my-experiment spec: duration: 1h templates: - name: baseline replicas: 1 template: ... - name: canary replicas: 1 template: ... analyses: - name: success-rate templateName: success-rate ``` → 1 시간 실행, metric 비교. ### Flagger (alternative) ```yaml apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: my-app spec: targetRef: apiVersion: apps/v1 kind: Deployment name: my-app analysis: interval: 1m threshold: 5 iterations: 10 metrics: - name: request-success-rate thresholdRange: { min: 99 } ``` → Flux / Helm 친화. Argo Rollouts 와 비슷. ### Rollback ```bash kubectl argo rollouts undo my-app # 또는 spec 의 image 옛 version 으로 revert ``` → 이전 ReplicaSet 가 active. ### Auto-rollback (metric) ```yaml spec: strategy: canary: steps: - setWeight: 10 - analysis: { templates: [{templateName: error-rate}] } # error-rate fail = automatic rollback ``` → 사람 없이 도 안전. ### Multiple analysis ```yaml analysis: templates: - templateName: success-rate - templateName: latency-p99 - templateName: error-rate ``` → 모두 pass = promote. ### Web push (alarm) ```yaml metrics: - name: success-rate successCondition: result[0] >= 0.95 failureCondition: result[0] < 0.9 failureLimit: 3 inconclusiveLimit: 5 # rate 가 metric 모름 = inconclusive ``` → 명시적 fail / inconclusive. ### Web hook (외부 system) ```yaml metrics: - name: web-test provider: web: url: https://my-api.example.com/health jsonPath: '{$.status}' method: GET successCondition: result == "healthy" ``` ### Notification (Slack) ```yaml apiVersion: argoproj.io/v1alpha1 kind: Rollout spec: ... notifications: onAbort: - slack onSuccess: - slack ``` → Promote / abort 시 Slack 알림. ### GitOps (ArgoCD + Argo Rollouts) ``` 1. Push new image tag to git. 2. ArgoCD sync = Rollout spec update. 3. Rollout 가 canary 시작. 4. Metric pass = promote. 5. Fail = auto rollback (git revert 안 함, K8s level). ``` → 매 deploy 가 progressive. ### Cost / overhead ``` - 매 canary 가 추가 replica (50% extra during rollout) - Metric query 가 cluster cost - Engineering 시간 → 매 deploy 가 큰 risk = 가치. ``` ### Real-world - **Intuit** (Argo 의 owner) - **Adobe**: 큰 Argo 사용 - **GitHub**: 비슷한 internal - **Spotify**: Flagger - **모든 SaaS**: progressive delivery 어떻든 ### When NOT? ``` - 작은 internal tool: rolling deploy 충분. - Stateful: blue-green 어려움 (DB). - Cron / batch job: canary 의미 X. → Critical path API / web 가 sweet spot. ``` ### Stateful 의 함정 ``` DB schema 변경: - v1 + v2 가 동시 = schema 가 둘 다 호환. - Backward compatible migration 필수. → "expand-contract": 1. 새 column 추가 (v1 OK). 2. v2 가 새 column 사용. 3. v1 retire. 4. 옛 column 삭제. ``` ### Header-based testing ``` QA team 가 header 추가 → canary 만 사용. "X-Canary: true" → v2 만 받음. → Production traffic 0% 의 진짜 canary. ``` ### LaunchDarkly + Argo ``` Feature flag (LD) + 점진 rollout (Argo). - Argo: 새 version 의 traffic %. - LD: 새 feature 의 user %. → 둘 다 layer. ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | 큰 traffic | Canary + analysis | | Critical | Blue-green | | Beta / A/B | Experiment | | GitOps | ArgoCD + Rollouts | | Flux | Flagger | | Service mesh 있음 | Istio + Argo | | 작은 system | Helm rolling | ## ❌ 안티패턴 - **Auto-promote 만 + analysis 없음**: 위험. - **첫 deploy 가 100%**: pause + manual. - **DB schema breaking + canary**: data 깨짐. - **Metric query 가 too narrow**: false signal. - **Manual promote 만**: 사람 없이 안 됨. - **Rollback test 없음**: 진짜 안 됨. - **Resource limit 없음**: canary 가 cluster 죽임. ## 🤖 LLM 활용 힌트 - Canary + metric analysis 가 modern progressive. - Blue-green 가 stateful 가 어려움. - ArgoCD + Argo Rollouts 가 GitOps + delivery. - Flagger 가 alternative. ## 🔗 관련 문서 - [[DevOps_Deployment_Strategies]] - [[DevOps_ArgoCD_GitOps]] - [[DevOps_Service_Mesh_Deep]]