Files
2nd/10_Wiki/Topics/Architecture/DORA-Metrics.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

185 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-dora-metrics
title: DORA Metrics
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [dora, four-keys, accelerate-metrics]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [dora, devops, metrics, delivery-performance]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: sql
framework: github-actions/grafana
---
# DORA Metrics
## 매 한 줄
> **"매 software delivery performance 의 매 4 numbers"**: deployment frequency, lead time for changes, change failure rate, mean time to restore (MTTR). 매 Forsgren/Humble/Kim "Accelerate" (2018) + 매 yearly DORA report 가 매 source-of-truth. 2026 의 매 5번째 metric (reliability) 의 official.
## 매 핵심
### 매 the 4 (now 5)
1. **Deployment Frequency (DF)** — production deploys per period. Elite: on-demand (multiple/day).
2. **Lead Time for Changes (LT)** — commit → production. Elite: < 1 day.
3. **Change Failure Rate (CFR)** — % deploys causing incident/rollback. Elite: 05%.
4. **Mean Time to Restore (MTTR)** — incident detection → resolution. Elite: < 1 hour.
5. **Reliability** (added 2021/2022 reports) — meeting/exceeding SLO targets.
### 매 performance bands (2024 report)
- **Elite** — frequent deploys, < 1 day LT, 05% CFR, < 1h MTTR.
- **High** — weeklymonthly, 1 day1 week, 010%, < 1 day.
- **Medium** — monthly6m, 1 week1 month, 015%, < 1 week.
- **Low** — < 6m, > 6m, > 64%, > 1 week.
### 매 accelerators (capabilities)
- Trunk-based development.
- Continuous integration.
- Test automation.
- Loosely coupled architecture.
- Generative culture (Westrum).
- Database change automation.
- Empowered teams.
## 💻 패턴
### Lead time SQL (GitHub + production deploy events)
```sql
-- 매 first commit in PR → production deploy 의 measure
WITH deploys AS (
SELECT service, deploy_id, deployed_at, sha
FROM deployments WHERE environment = 'production'
),
commits AS (
SELECT pr.merge_commit_sha AS sha, MIN(c.committed_at) AS first_commit_at
FROM pull_requests pr JOIN commits c ON c.pr_id = pr.id
GROUP BY pr.merge_commit_sha
)
SELECT d.service,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM d.deployed_at - c.first_commit_at)) AS lt_p50_seconds,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM d.deployed_at - c.first_commit_at)) AS lt_p95_seconds
FROM deploys d JOIN commits c USING (sha)
WHERE d.deployed_at > NOW() - INTERVAL '90 days'
GROUP BY d.service;
```
### GitHub Actions 의 deploy event emit
```yaml
- name: Record deployment
if: success()
run: |
curl -X POST "$DORA_API/deployments" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"service\":\"orders\",\"sha\":\"${{ github.sha }}\",\"environment\":\"production\",\"deployed_at\":\"$(date -u +%FT%TZ)\"}"
```
### Change failure (PagerDuty + deploy correlation)
```ts
async function changeFailureRate(service: string, days = 30) {
const deploys = await db.deployments.count({ service, since: daysAgo(days) });
const failedDeploys = await db.deployments.count({
service, since: daysAgo(days),
correlatedIncidentWithin: '4h', // 매 incident 가 deploy 후 4h 내 의 fail count
});
return failedDeploys / deploys;
}
```
### MTTR from PagerDuty
```ts
import { api } from '@pagerduty/pdjs';
const pd = api({ token: process.env.PD_TOKEN! });
const { data } = await pd.get('/incidents', {
data: { since: daysAgo(30), service_ids: [SVC_ID], statuses: ['resolved'] },
});
const durations = data.incidents.map(i =>
(new Date(i.resolved_at).getTime() - new Date(i.created_at).getTime()) / 1000);
const mttr = durations.reduce((a,b)=>a+b,0) / durations.length;
```
### Four Keys (Google) on BigQuery
```sql
-- 매 https://github.com/dora-team/fourkeys 의 reference
SELECT
COUNTIF(event_type = 'deployment') AS deploys,
AVG(TIMESTAMP_DIFF(deploy_time, first_commit_time, MINUTE)) AS lt_minutes,
COUNTIF(failed) / NULLIF(COUNTIF(event_type = 'deployment'), 0) AS cfr,
AVG(TIMESTAMP_DIFF(resolved_time, incident_time, MINUTE)) AS mttr_minutes
FROM `project.four_keys.events_raw`
WHERE deploy_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY);
```
### Grafana dashboard panel (PromQL-style)
```promql
# Deployment frequency (deploys per day per service)
sum by (service) (rate(deployments_total{env="production"}[7d])) * 86400
# Change failure rate
sum by (service) (deployments_failed_total[30d])
/ sum by (service) (deployments_total[30d])
# Lead time p50
histogram_quantile(0.5, sum by (le, service) (rate(deploy_lead_time_seconds_bucket[30d])))
```
### Reliability (SLO-aligned 5th metric)
```ts
// 매 service 의 SLO 의 meeting-or-exceeding 의 % of measurement windows
const reliability = await prom.query(`
(sum_over_time(slo_compliance{service="$svc"}[30d]) /
count_over_time(slo_compliance{service="$svc"}[30d])) * 100
`);
```
### Anti-gaming guardrails
```ts
// 매 metric 의 isolated 의 game 가 가능 — pair 의 always 의 read
const elite = (df > 1/day) && (lt < 1*day) && (cfr < 0.05) && (mttr < 1*hour);
// 매 elite 가 X 만 high cfr 의 hide 의 X.
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Greenfield team | Adopt Four Keys (open source) on BigQuery |
| GitHub-centric | dora-team/fourkeys + Cloud Run pipelines |
| Multi-tool | LinearB / Sleuth / Faros AI / Jellyfish (SaaS) |
| Self-host | Apache DevLake (LF AI) |
| Enterprise governance | Faros + custom dashboards |
**기본값**: Apache DevLake (open source) or Four Keys reference impl; weekly review with team; show all 4 (5) together — never single metric.
## 🔗 Graph
- 부모: [[DevOps]] · [[Continuous-Delivery]]
- 변형: [[Engineering-Metrics]]
- 응용: [[Trunk-Based-Development]] · [[Continuous-Integration]] · [[SRE]]
- Adjacent: [[Postmortem-Culture]]
## 🤖 LLM 활용
**언제**: 매 metric definition explanation, 매 SQL/PromQL query authoring, 매 trend interpretation, 매 retrospective talking points generation.
**언제 X**: 매 individual performance evaluation (DORA 의 team-level metric — never individual). 매 metric tuning to look good (gaming).
## ❌ 안티패턴
- **Single metric optimization**: 매 deploy frequency 의 increase 만 → CFR explodes. 매 4 의 always 의 together 보기.
- **Individual performance ranking**: 매 explicitly anti-pattern in DORA research. 매 team-level만.
- **Vanity deploys**: 매 empty commits / config-only changes 의 count → meaningless.
- **MTTR from "ticket close"**: 매 customer-impact end 의 measure, 매 ticket admin 가 X.
- **Comparing teams in different domains**: 매 fintech vs internal tool 의 baselines 가 different.
- **No deployment instrumentation**: 매 manual spreadsheet 가 X. 매 auto-emit deploy event.
## 🧪 검증 / 중복
- Verified (Forsgren/Humble/Kim "Accelerate" 2018, Google DORA 2024 State of DevOps report, dora-team/fourkeys, Apache DevLake).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — DORA 4(5) metrics, queries, anti-patterns |