f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.5 KiB
7.5 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-텔레메트리-telemetry | 텔레메트리 (Telemetry) | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
텔레메트리 (Telemetry)
매 한 줄
"매 system 이 자신의 internal state 를 외부로 emit 하는 행위 — 매 metric, trace, log 의 trinity.". 매 Greek 어원 'tele (원격) + metron (측정)'. 2026 modern stack 의 매 de-facto standard 는 매 OpenTelemetry 2.x — 매 vendor-neutral 의 instrumentation API 와 매 OTLP wire protocol.
매 핵심
매 Three Pillars
- Metrics: 매 numeric aggregation (counter, gauge, histogram). 매 low cardinality. 매 alerting 의 source.
- Traces: 매 distributed request 의 causal chain. Span tree. 매 high cardinality.
- Logs: 매 discrete event records. 매 structured (JSON) 권장.
매 2026 추가 pillar
- Profiles (continuous profiling): 매 CPU / memory flame graph 의 sampling. eBPF + pprof 의 stack. Pyroscope / Parca / Grafana Profiles.
매 Push vs Pull
- Push: agent → collector (OTLP, statsd). 매 ephemeral workload 적합.
- Pull: scraper → endpoint (Prometheus). 매 long-running service 적합.
매 응용
- SLO/SLI 의 측정 — 매 error budget 계산.
- Distributed debugging — 매 trace 로 매 cross-service latency 추적.
- Capacity planning — 매 historical metric 로 매 forecast.
- Security audit — 매 log + trace 의 incident reconstruction.
💻 패턴
Pattern 1 — OpenTelemetry SDK setup (Node)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { resourceFromAttributes } from '@opentelemetry/resources';
const sdk = new NodeSDK({
resource: resourceFromAttributes({
'service.name': 'order-api',
'service.version': '1.4.0',
'deployment.environment': process.env.ENV ?? 'dev',
}),
traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4318/v1/traces' }),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4318/v1/metrics' }),
exportIntervalMillis: 10_000,
}),
});
sdk.start();
Pattern 2 — Manual span
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('order-api');
async function placeOrder(orderId: string) {
return tracer.startActiveSpan('placeOrder', async (span) => {
try {
span.setAttribute('order.id', orderId);
const result = await chargeCard(orderId);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
});
}
Pattern 3 — Counter / Histogram
import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('order-api');
const orderCounter = meter.createCounter('orders.placed', {
description: 'Total orders placed',
});
const latencyHist = meter.createHistogram('order.latency_ms', {
description: 'Order placement latency',
unit: 'ms',
});
const start = performance.now();
await placeOrder(id);
orderCounter.add(1, { region: 'kr', tier: 'premium' });
latencyHist.record(performance.now() - start, { route: 'POST /orders' });
Pattern 4 — Structured logging with trace correlation
import { trace } from '@opentelemetry/api';
import pino from 'pino';
const logger = pino({
mixin: () => {
const span = trace.getActiveSpan();
if (!span) return {};
const ctx = span.spanContext();
return { trace_id: ctx.traceId, span_id: ctx.spanId };
},
});
logger.info({ orderId: '123' }, 'order placed');
// 매 log → trace 의 매 join 가능.
Pattern 5 — Sampling (head-based)
import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
sampler: new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.1), // 매 10% sample.
}),
// ...
});
Pattern 6 — Exemplar (metric → trace link)
import { ExplicitBucketHistogramAggregation } from '@opentelemetry/sdk-metrics';
// 매 metric record 시 매 trace_id 첨부 — Grafana 의 매 metric → trace drill-down.
latencyHist.record(latency, attrs);
// 매 exemplar 는 매 SDK 가 매 active span 에서 자동 추출.
Pattern 7 — Context propagation (HTTP header)
import { propagation, context } from '@opentelemetry/api';
// 매 outbound — header inject.
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
fetch('https://api.example.com', { headers });
// 매 inbound — header extract.
app.use((req, res, next) => {
const ctx = propagation.extract(context.active(), req.headers);
context.with(ctx, () => next());
});
// 매 traceparent / tracestate W3C header.
Pattern 8 — RED method instrumentation
// Rate, Errors, Duration — 매 service-level minimum.
const reqCounter = meter.createCounter('http.requests');
const errCounter = meter.createCounter('http.errors');
const durHist = meter.createHistogram('http.duration_ms');
app.use((req, res, next) => {
const start = performance.now();
res.on('finish', () => {
const labels = { route: req.route?.path, method: req.method, status: res.statusCode };
reqCounter.add(1, labels);
if (res.statusCode >= 500) errCounter.add(1, labels);
durHist.record(performance.now() - start, labels);
});
next();
});
매 결정 기준
| 상황 | Telemetry choice |
|---|---|
| 매 service-level alerting | Metrics (RED / USE) |
| 매 cross-service latency 분석 | Traces |
| 매 incident forensics | Logs + Traces |
| 매 CPU hotspot | Profiles (continuous) |
| 매 high cardinality dimension | Traces (NOT metrics) |
| 매 cost 민감 | Sampling 0.01–0.1 |
기본값: 매 OpenTelemetry SDK + OTLP exporter → Collector → Grafana / Datadog / Honeycomb. 매 vendor lock-in 의 회피.
🔗 Graph
- 변형: Tracing (Jaeger / Tempo)
- Adjacent: OpenTelemetry Collector
🤖 LLM 활용
언제: 매 production service 의 instrumentation 설계, OTel migration, 매 cardinality 분석. 언제 X: 매 dev-only script. 매 high cardinality dimension 을 metrics 에 — 매 cost explosion.
❌ 안티패턴
- High cardinality on metrics: 매 user_id 를 매 metric label — 매 storage 폭발.
- Trace 만 의존: 매 trace 는 매 sampled — 매 absolute count 신뢰 X.
- Unstructured logs: 매 string concat — 매 query 불가.
- Vendor SDK lock-in: 매 OTel 대신 매 Datadog SDK 직접 — 매 migration 비용.
- No sampling: 매 100% trace 전송 — 매 cost / latency 부담.
🧪 검증 / 중복
- Verified (OpenTelemetry 2.x docs 2026, CNCF observability whitepaper).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Three Pillars + Profiles + 8 OTel patterns |