2nd/10_Wiki/Topics/Architecture/텔레메트리_(Telemetry).md

---
id: wiki-2026-0508-텔레메트리-telemetry
title: 텔레메트리 (Telemetry)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Telemetry, Observability data, Metrics + Traces + Logs]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [observability, otel, metrics, traces, logs, architecture]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: TypeScript
  framework: OpenTelemetry 2.x
---

# 텔레메트리 (Telemetry)

## 매 한 줄
> **"매 system 이 자신의 internal state 를 외부로 emit 하는 행위 — 매 metric, trace, log 의 trinity."**. 매 Greek 어원 'tele (원격) + metron (측정)'. 2026 modern stack 의 매 de-facto standard 는 매 OpenTelemetry 2.x — 매 vendor-neutral 의 instrumentation API 와 매 OTLP wire protocol.

## 매 핵심

### 매 Three Pillars
- **Metrics**: 매 numeric aggregation (counter, gauge, histogram). 매 low cardinality. 매 alerting 의 source.
- **Traces**: 매 distributed request 의 causal chain. Span tree. 매 high cardinality.
- **Logs**: 매 discrete event records. 매 structured (JSON) 권장.

### 매 2026 추가 pillar
- **Profiles** (continuous profiling): 매 CPU / memory flame graph 의 sampling. eBPF + pprof 의 stack. Pyroscope / Parca / Grafana Profiles.

### 매 Push vs Pull
- **Push**: agent → collector (OTLP, statsd). 매 ephemeral workload 적합.
- **Pull**: scraper → endpoint (Prometheus). 매 long-running service 적합.

### 매 응용
1. SLO/SLI 의 측정 — 매 error budget 계산.
2. Distributed debugging — 매 trace 로 매 cross-service latency 추적.
3. Capacity planning — 매 historical metric 로 매 forecast.
4. Security audit — 매 log + trace 의 incident reconstruction.

## 💻 패턴

### Pattern 1 — OpenTelemetry SDK setup (Node)
```typescript
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { resourceFromAttributes } from '@opentelemetry/resources';

const sdk = new NodeSDK({
  resource: resourceFromAttributes({
    'service.name': 'order-api',
    'service.version': '1.4.0',
    'deployment.environment': process.env.ENV ?? 'dev',
  }),
  traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4318/v1/traces' }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4318/v1/metrics' }),
    exportIntervalMillis: 10_000,
  }),
});

sdk.start();
```

### Pattern 2 — Manual span
```typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-api');

async function placeOrder(orderId: string) {
  return tracer.startActiveSpan('placeOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      const result = await chargeCard(orderId);
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw err;
    } finally {
      span.end();
    }
  });
}
```

### Pattern 3 — Counter / Histogram
```typescript
import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('order-api');
const orderCounter = meter.createCounter('orders.placed', {
  description: 'Total orders placed',
});
const latencyHist = meter.createHistogram('order.latency_ms', {
  description: 'Order placement latency',
  unit: 'ms',
});

const start = performance.now();
await placeOrder(id);
orderCounter.add(1, { region: 'kr', tier: 'premium' });
latencyHist.record(performance.now() - start, { route: 'POST /orders' });
```

### Pattern 4 — Structured logging with trace correlation
```typescript
import { trace } from '@opentelemetry/api';
import pino from 'pino';

const logger = pino({
  mixin: () => {
    const span = trace.getActiveSpan();
    if (!span) return {};
    const ctx = span.spanContext();
    return { trace_id: ctx.traceId, span_id: ctx.spanId };
  },
});

logger.info({ orderId: '123' }, 'order placed');
// 매 log → trace 의 매 join 가능.
```

### Pattern 5 — Sampling (head-based)
```typescript
import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';

const sdk = new NodeSDK({
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(0.1), // 매 10% sample.
  }),
  // ...
});
```

### Pattern 6 — Exemplar (metric → trace link)
```typescript
import { ExplicitBucketHistogramAggregation } from '@opentelemetry/sdk-metrics';

// 매 metric record 시 매 trace_id 첨부 — Grafana 의 매 metric → trace drill-down.
latencyHist.record(latency, attrs);
// 매 exemplar 는 매 SDK 가 매 active span 에서 자동 추출.
```

### Pattern 7 — Context propagation (HTTP header)
```typescript
import { propagation, context } from '@opentelemetry/api';

// 매 outbound — header inject.
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
fetch('https://api.example.com', { headers });

// 매 inbound — header extract.
app.use((req, res, next) => {
  const ctx = propagation.extract(context.active(), req.headers);
  context.with(ctx, () => next());
});
// 매 traceparent / tracestate W3C header.
```

### Pattern 8 — RED method instrumentation
```typescript
// Rate, Errors, Duration — 매 service-level minimum.
const reqCounter = meter.createCounter('http.requests');
const errCounter = meter.createCounter('http.errors');
const durHist = meter.createHistogram('http.duration_ms');

app.use((req, res, next) => {
  const start = performance.now();
  res.on('finish', () => {
    const labels = { route: req.route?.path, method: req.method, status: res.statusCode };
    reqCounter.add(1, labels);
    if (res.statusCode >= 500) errCounter.add(1, labels);
    durHist.record(performance.now() - start, labels);
  });
  next();
});
```

## 매 결정 기준
| 상황 | Telemetry choice |
|---|---|
| 매 service-level alerting | Metrics (RED / USE) |
| 매 cross-service latency 분석 | Traces |
| 매 incident forensics | Logs + Traces |
| 매 CPU hotspot | Profiles (continuous) |
| 매 high cardinality dimension | Traces (NOT metrics) |
| 매 cost 민감 | Sampling 0.01–0.1 |

**기본값**: 매 OpenTelemetry SDK + OTLP exporter → Collector → Grafana / Datadog / Honeycomb. 매 vendor lock-in 의 회피.

## 🔗 Graph
- 부모: [[관측가능성 (Observability)]] · [[SRE 원칙]]
- 변형: [[Metrics (Prometheus)]] · [[Tracing (Jaeger / Tempo)]] · [[Logging (Loki / ELK)]] · [[Continuous Profiling]]
- 응용: [[SLO / SLI]] · [[분산 디버깅]] · [[Capacity Planning]]
- Adjacent: [[OpenTelemetry Collector]] · [[eBPF Observability]]

## 🤖 LLM 활용
**언제**: 매 production service 의 instrumentation 설계, OTel migration, 매 cardinality 분석.
**언제 X**: 매 dev-only script. 매 high cardinality dimension 을 metrics 에 — 매 cost explosion.

## ❌ 안티패턴
- **High cardinality on metrics**: 매 user_id 를 매 metric label — 매 storage 폭발.
- **Trace 만 의존**: 매 trace 는 매 sampled — 매 absolute count 신뢰 X.
- **Unstructured logs**: 매 string concat — 매 query 불가.
- **Vendor SDK lock-in**: 매 OTel 대신 매 Datadog SDK 직접 — 매 migration 비용.
- **No sampling**: 매 100% trace 전송 — 매 cost / latency 부담.

## 🧪 검증 / 중복
- Verified (OpenTelemetry 2.x docs 2026, CNCF observability whitepaper).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Three Pillars + Profiles + 8 OTel patterns |