Files
2nd/10_Wiki/Topics/Coding/DevOps_Argo_Rollouts.md
T
2026-05-10 22:08:15 +09:00

8.0 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
devops-argo-rollouts Argo Rollouts — Canary / Blue-Green deploy Coding draft B conceptual 2026-05-09 2026-05-09
devops
deployment
vibe-coding
language applicable_to
YAML
DevOps
Argo Rollouts
canary
blue-green
progressive delivery
Flagger
AnalysisRun

Argo Rollouts

K8s Deployment 가 rolling 만 — 정밀 control X. Argo Rollouts: canary / blue-green / experiment. Auto rollback (metric 기반).

📖 핵심 개념

  • Canary: 1% → 10% → 100%.
  • Blue-green: 두 version, 한 번에 swap.
  • Analysis: Prometheus / Datadog metric 기반 promote / abort.
  • Service mesh + Argo = traffic shifting.

💻 코드 패턴

Rollout (Deployment 대신)

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 5
  strategy:
    canary:
      steps:
        - setWeight: 20
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
  template:
    spec:
      containers:
        - name: app
          image: myapp:v2

→ kubectl 가 자동 promote / rollback.

Manual promote

kubectl argo rollouts get rollout my-app
# → Visual progress

kubectl argo rollouts promote my-app
kubectl argo rollouts abort my-app

Pause + manual

strategy:
  canary:
    steps:
      - setWeight: 10
      - pause: {}    # 무한 — manual promote 까지
      - setWeight: 100

→ Production 첫 deploy = manual approve.

Blue-green

strategy:
  blueGreen:
    activeService: my-app-active
    previewService: my-app-preview
    autoPromotionEnabled: false
1. New ReplicaSet 만 (preview).
2. preview service 가 새 version.
3. Test / verify.
4. Promote = active service 가 새 version.
5. 옛 version 가 idle (rollback 가능).

Analysis (Prometheus)

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.95
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus.example.com
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
spec:
  strategy:
    canary:
      steps:
        - setWeight: 20
        - pause: { duration: 1m }
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: my-app
        - setWeight: 50

→ Success rate < 95% = abort (rollback).

Traffic management (Istio)

strategy:
  canary:
    canaryService: my-app-canary
    stableService: my-app-stable
    trafficRouting:
      istio:
        virtualServices:
          - name: my-app-vsvc
        destinationRule:
          name: my-app-destrule
          canarySubsetName: canary
          stableSubsetName: stable
    steps:
      - setWeight: 5
      - pause: { duration: 10m }
      - setWeight: 25

→ Istio 가 weighted routing.

Header-based routing

steps:
  - setHeaderRoute:
      name: beta-route
      match:
        - headerName: X-Canary
          headerValue:
            exact: "true"
  - pause: {}   # beta 사용자 만 v2
  - setWeight: 50

→ "Beta" header 가진 user 만 canary.

NGINX / ALB ingress

trafficRouting:
  nginx:
    stableIngress: my-app-stable-ingress
    annotationPrefix: nginx.ingress.kubernetes.io

→ Service mesh 없이도.

Experiment (long-running A/B)

apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
  name: my-experiment
spec:
  duration: 1h
  templates:
    - name: baseline
      replicas: 1
      template: ...
    - name: canary
      replicas: 1
      template: ...
  analyses:
    - name: success-rate
      templateName: success-rate

→ 1 시간 실행, metric 비교.

Flagger (alternative)

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  analysis:
    interval: 1m
    threshold: 5
    iterations: 10
    metrics:
      - name: request-success-rate
        thresholdRange: { min: 99 }

→ Flux / Helm 친화. Argo Rollouts 와 비슷.

Rollback

kubectl argo rollouts undo my-app

# 또는 spec 의 image 옛 version 으로 revert

→ 이전 ReplicaSet 가 active.

Auto-rollback (metric)

spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - analysis: { templates: [{templateName: error-rate}] }
        # error-rate fail = automatic rollback

→ 사람 없이 도 안전.

Multiple analysis

analysis:
  templates:
    - templateName: success-rate
    - templateName: latency-p99
    - templateName: error-rate

→ 모두 pass = promote.

Web push (alarm)

metrics:
  - name: success-rate
    successCondition: result[0] >= 0.95
    failureCondition: result[0] < 0.9
    failureLimit: 3
    inconclusiveLimit: 5    # rate 가 metric 모름 = inconclusive

→ 명시적 fail / inconclusive.

Web hook (외부 system)

metrics:
  - name: web-test
    provider:
      web:
        url: https://my-api.example.com/health
        jsonPath: '{$.status}'
        method: GET
    successCondition: result == "healthy"

Notification (Slack)

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  ...
  notifications:
    onAbort:
      - slack
    onSuccess:
      - slack

→ Promote / abort 시 Slack 알림.

GitOps (ArgoCD + Argo Rollouts)

1. Push new image tag to git.
2. ArgoCD sync = Rollout spec update.
3. Rollout 가 canary 시작.
4. Metric pass = promote.
5. Fail = auto rollback (git revert 안 함, K8s level).

→ 매 deploy 가 progressive.

Cost / overhead

- 매 canary 가 추가 replica (50% extra during rollout)
- Metric query 가 cluster cost
- Engineering 시간

→ 매 deploy 가 큰 risk = 가치.

Real-world

  • Intuit (Argo 의 owner)
  • Adobe: 큰 Argo 사용
  • GitHub: 비슷한 internal
  • Spotify: Flagger
  • 모든 SaaS: progressive delivery 어떻든

When NOT?

- 작은 internal tool: rolling deploy 충분.
- Stateful: blue-green 어려움 (DB).
- Cron / batch job: canary 의미 X.

→ Critical path API / web 가 sweet spot.

Stateful 의 함정

DB schema 변경:
- v1 + v2 가 동시 = schema 가 둘 다 호환.
- Backward compatible migration 필수.

→ "expand-contract":
  1. 새 column 추가 (v1 OK).
  2. v2 가 새 column 사용.
  3. v1 retire.
  4. 옛 column 삭제.

Header-based testing

QA team 가 header 추가 → canary 만 사용.
"X-Canary: true" → v2 만 받음.

→ Production traffic 0% 의 진짜 canary.

LaunchDarkly + Argo

Feature flag (LD) + 점진 rollout (Argo).
- Argo: 새 version 의 traffic %.
- LD: 새 feature 의 user %.

→ 둘 다 layer.

🤔 의사결정 기준

상황 추천
큰 traffic Canary + analysis
Critical Blue-green
Beta / A/B Experiment
GitOps ArgoCD + Rollouts
Flux Flagger
Service mesh 있음 Istio + Argo
작은 system Helm rolling

안티패턴

  • Auto-promote 만 + analysis 없음: 위험.
  • 첫 deploy 가 100%: pause + manual.
  • DB schema breaking + canary: data 깨짐.
  • Metric query 가 too narrow: false signal.
  • Manual promote 만: 사람 없이 안 됨.
  • Rollback test 없음: 진짜 안 됨.
  • Resource limit 없음: canary 가 cluster 죽임.

🤖 LLM 활용 힌트

  • Canary + metric analysis 가 modern progressive.
  • Blue-green 가 stateful 가 어려움.
  • ArgoCD + Argo Rollouts 가 GitOps + delivery.
  • Flagger 가 alternative.

🔗 관련 문서