Files
2nd/10_Wiki/Topics/DevOps_and_Security/Engineering Metrics (DORA).md
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.1 KiB
Raw Permalink Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-engineering-metrics-dora Engineering Metrics (DORA) 10_Wiki/Topics verified self
DORA
DORA Metrics
Four Keys
DevOps Research and Assessment
none A 0.9 applied
devops
metrics
dora
sre
engineering
2026-05-10 pending
language framework
yaml github-actions

Engineering Metrics (DORA)

매 한 줄

"매 deployment frequency, lead time, change fail rate, MTTR — 4 metric 으로 매 engineering org 의 health 측정". 매 2014 Google DORA team 의 launch, 매 2021 SPACE framework 보완, 매 2026 GitHub/GitLab/Datadog 의 native dashboard 의 default.

매 핵심

매 Four Keys

  • Deployment Frequency (DF): 매 production deploy 의 빈도. Elite = on-demand (multiple/day).
  • Lead Time for Changes (LT): 매 commit → production. Elite = < 1 day.
  • Change Failure Rate (CFR): 매 deploy 의 incident 유발 비율. Elite = 015%.
  • Mean Time to Recovery (MTTR): 매 incident → restore. Elite = < 1 hour.

매 Performance tier

  • Elite: DF on-demand · LT < 1day · CFR 015% · MTTR < 1h.
  • High: DF weeklydaily · LT 1day1wk · CFR 1630% · MTTR < 1day.
  • Medium: DF monthly · LT 1wk1mo · CFR 1630% · MTTR 1day1wk.
  • Low: DF < monthly · LT > 1mo.

매 응용

  1. Sprint retro 매 주 review.
  2. Quarterly engineering OKR target.
  3. Hiring/promo signal (team-level, 매 individual 아님).

💻 패턴

GitHub Actions deployment frequency

# .github/workflows/deploy.yml
name: deploy
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh
      - name: Emit DORA event
        run: |
          curl -X POST https://api.dora-collector.internal/events \
            -H "Authorization: Bearer ${{ secrets.DORA_TOKEN }}" \
            -d '{"type":"deploy","sha":"${{ github.sha }}","ts":"'$(date -u +%FT%TZ)'"}'

Lead time calculation (SQL)

-- commits joined with deploys
SELECT
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (deploy_ts - commit_ts))/3600) AS p50_hours,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (deploy_ts - commit_ts))/3600) AS p95_hours
FROM dora_events
WHERE deploy_ts >= NOW() - INTERVAL '30 days';

Change failure rate from incidents

# rolling 30d CFR
def cfr(deploys: list[dict], incidents: list[dict]) -> float:
    bad_deploys = {i["deploy_sha"] for i in incidents if i["caused_by_deploy"]}
    return len(bad_deploys) / max(len(deploys), 1)

MTTR via PagerDuty

import httpx, statistics
def mttr(api_key: str, since: str) -> float:
    r = httpx.get("https://api.pagerduty.com/incidents",
                  headers={"Authorization": f"Token token={api_key}"},
                  params={"since": since, "statuses[]": "resolved"})
    durations = [(i["resolved_at_ts"] - i["created_at_ts"]) for i in r.json()["incidents"]]
    return statistics.median(durations) / 60  # minutes

Four Keys dashboard (Datadog)

# datadog-dora.yaml
widgets:
  - title: Deployment Frequency
    query: "sum:dora.deploy{*}.as_count().rollup(sum, 86400)"
  - title: Lead Time p50
    query: "p50:dora.lead_time_seconds{*}"
  - title: CFR
    query: "sum:dora.deploy_failed{*} / sum:dora.deploy{*}"
  - title: MTTR p50
    query: "p50:dora.incident_resolve_seconds{*}"

Trunk-based config (lead time 단축)

# .github/branch-protection.yml
required_status_checks:
  strict: true
  contexts: [ci/test, ci/lint]
required_pull_request_reviews:
  required_approving_review_count: 1
  dismiss_stale_reviews: true
restrictions: null  # 매 직접 push 매 X — PR-only

매 결정 기준

상황 Approach
Startup (<20 eng) DF + LT 매 우선, MTTR 매 secondary
Regulated industry CFR 매 primary (release safety)
Platform team All 4, 매 weekly review
Individual perf review 매 X — team metric only

기본값: 매 four-keys-platform (Google open source) self-host + Grafana.

🔗 Graph

🤖 LLM 활용

언제: deploy log → metric extraction, incident root-cause 분류 (deploy 유발 여부). 언제 X: 매 individual contributor scoring 매 X — DORA 매 team-level only.

안티패턴

  • Goodharting: DF 만 chase 하고 quality 무시 → CFR 폭증.
  • Individual scoring: developer 별 LT 측정 → gaming (small commits 만).
  • Vanity rollups: org-wide average — 팀 distribution 의 hide.
  • No CFR: deploy 만 count, failure track X → false elite signal.

🧪 검증 / 중복

  • Verified (DORA "State of DevOps" 20142024 reports, Google Cloud 공식).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — DORA four-keys 정의 + dashboard pattern