--- id: wiki-2026-0508-텔레메트리-telemetry title: 텔레메트리 (Telemetry) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Telemetry, Observability data, Metrics + Traces + Logs] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [observability, otel, metrics, traces, logs, architecture] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: TypeScript framework: OpenTelemetry 2.x --- # 텔레메트리 (Telemetry) ## 매 한 줄 > **"매 system 이 자신의 internal state 를 외부로 emit 하는 행위 — 매 metric, trace, log 의 trinity."**. 매 Greek 어원 'tele (원격) + metron (측정)'. 2026 modern stack 의 매 de-facto standard 는 매 OpenTelemetry 2.x — 매 vendor-neutral 의 instrumentation API 와 매 OTLP wire protocol. ## 매 핵심 ### 매 Three Pillars - **Metrics**: 매 numeric aggregation (counter, gauge, histogram). 매 low cardinality. 매 alerting 의 source. - **Traces**: 매 distributed request 의 causal chain. Span tree. 매 high cardinality. - **Logs**: 매 discrete event records. 매 structured (JSON) 권장. ### 매 2026 추가 pillar - **Profiles** (continuous profiling): 매 CPU / memory flame graph 의 sampling. eBPF + pprof 의 stack. Pyroscope / Parca / Grafana Profiles. ### 매 Push vs Pull - **Push**: agent → collector (OTLP, statsd). 매 ephemeral workload 적합. - **Pull**: scraper → endpoint (Prometheus). 매 long-running service 적합. ### 매 응용 1. SLO/SLI 의 측정 — 매 error budget 계산. 2. Distributed debugging — 매 trace 로 매 cross-service latency 추적. 3. Capacity planning — 매 historical metric 로 매 forecast. 4. Security audit — 매 log + trace 의 incident reconstruction. ## 💻 패턴 ### Pattern 1 — OpenTelemetry SDK setup (Node) ```typescript import { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'; import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http'; import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics'; import { resourceFromAttributes } from '@opentelemetry/resources'; const sdk = new NodeSDK({ resource: resourceFromAttributes({ 'service.name': 'order-api', 'service.version': '1.4.0', 'deployment.environment': process.env.ENV ?? 'dev', }), traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4318/v1/traces' }), metricReader: new PeriodicExportingMetricReader({ exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4318/v1/metrics' }), exportIntervalMillis: 10_000, }), }); sdk.start(); ``` ### Pattern 2 — Manual span ```typescript import { trace, SpanStatusCode } from '@opentelemetry/api'; const tracer = trace.getTracer('order-api'); async function placeOrder(orderId: string) { return tracer.startActiveSpan('placeOrder', async (span) => { try { span.setAttribute('order.id', orderId); const result = await chargeCard(orderId); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (err) { span.recordException(err as Error); span.setStatus({ code: SpanStatusCode.ERROR }); throw err; } finally { span.end(); } }); } ``` ### Pattern 3 — Counter / Histogram ```typescript import { metrics } from '@opentelemetry/api'; const meter = metrics.getMeter('order-api'); const orderCounter = meter.createCounter('orders.placed', { description: 'Total orders placed', }); const latencyHist = meter.createHistogram('order.latency_ms', { description: 'Order placement latency', unit: 'ms', }); const start = performance.now(); await placeOrder(id); orderCounter.add(1, { region: 'kr', tier: 'premium' }); latencyHist.record(performance.now() - start, { route: 'POST /orders' }); ``` ### Pattern 4 — Structured logging with trace correlation ```typescript import { trace } from '@opentelemetry/api'; import pino from 'pino'; const logger = pino({ mixin: () => { const span = trace.getActiveSpan(); if (!span) return {}; const ctx = span.spanContext(); return { trace_id: ctx.traceId, span_id: ctx.spanId }; }, }); logger.info({ orderId: '123' }, 'order placed'); // 매 log → trace 의 매 join 가능. ``` ### Pattern 5 — Sampling (head-based) ```typescript import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base'; const sdk = new NodeSDK({ sampler: new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.1), // 매 10% sample. }), // ... }); ``` ### Pattern 6 — Exemplar (metric → trace link) ```typescript import { ExplicitBucketHistogramAggregation } from '@opentelemetry/sdk-metrics'; // 매 metric record 시 매 trace_id 첨부 — Grafana 의 매 metric → trace drill-down. latencyHist.record(latency, attrs); // 매 exemplar 는 매 SDK 가 매 active span 에서 자동 추출. ``` ### Pattern 7 — Context propagation (HTTP header) ```typescript import { propagation, context } from '@opentelemetry/api'; // 매 outbound — header inject. const headers: Record = {}; propagation.inject(context.active(), headers); fetch('https://api.example.com', { headers }); // 매 inbound — header extract. app.use((req, res, next) => { const ctx = propagation.extract(context.active(), req.headers); context.with(ctx, () => next()); }); // 매 traceparent / tracestate W3C header. ``` ### Pattern 8 — RED method instrumentation ```typescript // Rate, Errors, Duration — 매 service-level minimum. const reqCounter = meter.createCounter('http.requests'); const errCounter = meter.createCounter('http.errors'); const durHist = meter.createHistogram('http.duration_ms'); app.use((req, res, next) => { const start = performance.now(); res.on('finish', () => { const labels = { route: req.route?.path, method: req.method, status: res.statusCode }; reqCounter.add(1, labels); if (res.statusCode >= 500) errCounter.add(1, labels); durHist.record(performance.now() - start, labels); }); next(); }); ``` ## 매 결정 기준 | 상황 | Telemetry choice | |---|---| | 매 service-level alerting | Metrics (RED / USE) | | 매 cross-service latency 분석 | Traces | | 매 incident forensics | Logs + Traces | | 매 CPU hotspot | Profiles (continuous) | | 매 high cardinality dimension | Traces (NOT metrics) | | 매 cost 민감 | Sampling 0.01–0.1 | **기본값**: 매 OpenTelemetry SDK + OTLP exporter → Collector → Grafana / Datadog / Honeycomb. 매 vendor lock-in 의 회피. ## 🔗 Graph - 부모: [[관측가능성 (Observability)]] · [[SRE 원칙]] - 변형: [[Metrics (Prometheus)]] · [[Tracing (Jaeger / Tempo)]] · [[Logging (Loki / ELK)]] · [[Continuous Profiling]] - 응용: [[SLO / SLI]] · [[분산 디버깅]] · [[Capacity Planning]] - Adjacent: [[OpenTelemetry Collector]] · [[eBPF Observability]] ## 🤖 LLM 활용 **언제**: 매 production service 의 instrumentation 설계, OTel migration, 매 cardinality 분석. **언제 X**: 매 dev-only script. 매 high cardinality dimension 을 metrics 에 — 매 cost explosion. ## ❌ 안티패턴 - **High cardinality on metrics**: 매 user_id 를 매 metric label — 매 storage 폭발. - **Trace 만 의존**: 매 trace 는 매 sampled — 매 absolute count 신뢰 X. - **Unstructured logs**: 매 string concat — 매 query 불가. - **Vendor SDK lock-in**: 매 OTel 대신 매 Datadog SDK 직접 — 매 migration 비용. - **No sampling**: 매 100% trace 전송 — 매 cost / latency 부담. ## 🧪 검증 / 중복 - Verified (OpenTelemetry 2.x docs 2026, CNCF observability whitepaper). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Three Pillars + Profiles + 8 OTel patterns |