Files
2nd/10_Wiki/Topics/Architecture/텔레메트리_(Telemetry).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.5 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-텔레메트리-telemetry 텔레메트리 (Telemetry) 10_Wiki/Topics verified self
Telemetry
Observability data
Metrics + Traces + Logs
none A 0.9 applied
observability
otel
metrics
traces
logs
architecture
2026-05-10 pending
language framework
TypeScript OpenTelemetry 2.x

텔레메트리 (Telemetry)

매 한 줄

"매 system 이 자신의 internal state 를 외부로 emit 하는 행위 — 매 metric, trace, log 의 trinity.". 매 Greek 어원 'tele (원격) + metron (측정)'. 2026 modern stack 의 매 de-facto standard 는 매 OpenTelemetry 2.x — 매 vendor-neutral 의 instrumentation API 와 매 OTLP wire protocol.

매 핵심

매 Three Pillars

  • Metrics: 매 numeric aggregation (counter, gauge, histogram). 매 low cardinality. 매 alerting 의 source.
  • Traces: 매 distributed request 의 causal chain. Span tree. 매 high cardinality.
  • Logs: 매 discrete event records. 매 structured (JSON) 권장.

매 2026 추가 pillar

  • Profiles (continuous profiling): 매 CPU / memory flame graph 의 sampling. eBPF + pprof 의 stack. Pyroscope / Parca / Grafana Profiles.

매 Push vs Pull

  • Push: agent → collector (OTLP, statsd). 매 ephemeral workload 적합.
  • Pull: scraper → endpoint (Prometheus). 매 long-running service 적합.

매 응용

  1. SLO/SLI 의 측정 — 매 error budget 계산.
  2. Distributed debugging — 매 trace 로 매 cross-service latency 추적.
  3. Capacity planning — 매 historical metric 로 매 forecast.
  4. Security audit — 매 log + trace 의 incident reconstruction.

💻 패턴

Pattern 1 — OpenTelemetry SDK setup (Node)

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { resourceFromAttributes } from '@opentelemetry/resources';

const sdk = new NodeSDK({
  resource: resourceFromAttributes({
    'service.name': 'order-api',
    'service.version': '1.4.0',
    'deployment.environment': process.env.ENV ?? 'dev',
  }),
  traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4318/v1/traces' }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4318/v1/metrics' }),
    exportIntervalMillis: 10_000,
  }),
});

sdk.start();

Pattern 2 — Manual span

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-api');

async function placeOrder(orderId: string) {
  return tracer.startActiveSpan('placeOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      const result = await chargeCard(orderId);
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw err;
    } finally {
      span.end();
    }
  });
}

Pattern 3 — Counter / Histogram

import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('order-api');
const orderCounter = meter.createCounter('orders.placed', {
  description: 'Total orders placed',
});
const latencyHist = meter.createHistogram('order.latency_ms', {
  description: 'Order placement latency',
  unit: 'ms',
});

const start = performance.now();
await placeOrder(id);
orderCounter.add(1, { region: 'kr', tier: 'premium' });
latencyHist.record(performance.now() - start, { route: 'POST /orders' });

Pattern 4 — Structured logging with trace correlation

import { trace } from '@opentelemetry/api';
import pino from 'pino';

const logger = pino({
  mixin: () => {
    const span = trace.getActiveSpan();
    if (!span) return {};
    const ctx = span.spanContext();
    return { trace_id: ctx.traceId, span_id: ctx.spanId };
  },
});

logger.info({ orderId: '123' }, 'order placed');
// 매 log → trace 의 매 join 가능.

Pattern 5 — Sampling (head-based)

import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';

const sdk = new NodeSDK({
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(0.1), // 매 10% sample.
  }),
  // ...
});
import { ExplicitBucketHistogramAggregation } from '@opentelemetry/sdk-metrics';

// 매 metric record 시 매 trace_id 첨부 — Grafana 의 매 metric → trace drill-down.
latencyHist.record(latency, attrs);
// 매 exemplar 는 매 SDK 가 매 active span 에서 자동 추출.

Pattern 7 — Context propagation (HTTP header)

import { propagation, context } from '@opentelemetry/api';

// 매 outbound — header inject.
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
fetch('https://api.example.com', { headers });

// 매 inbound — header extract.
app.use((req, res, next) => {
  const ctx = propagation.extract(context.active(), req.headers);
  context.with(ctx, () => next());
});
// 매 traceparent / tracestate W3C header.

Pattern 8 — RED method instrumentation

// Rate, Errors, Duration — 매 service-level minimum.
const reqCounter = meter.createCounter('http.requests');
const errCounter = meter.createCounter('http.errors');
const durHist = meter.createHistogram('http.duration_ms');

app.use((req, res, next) => {
  const start = performance.now();
  res.on('finish', () => {
    const labels = { route: req.route?.path, method: req.method, status: res.statusCode };
    reqCounter.add(1, labels);
    if (res.statusCode >= 500) errCounter.add(1, labels);
    durHist.record(performance.now() - start, labels);
  });
  next();
});

매 결정 기준

상황 Telemetry choice
매 service-level alerting Metrics (RED / USE)
매 cross-service latency 분석 Traces
매 incident forensics Logs + Traces
매 CPU hotspot Profiles (continuous)
매 high cardinality dimension Traces (NOT metrics)
매 cost 민감 Sampling 0.010.1

기본값: 매 OpenTelemetry SDK + OTLP exporter → Collector → Grafana / Datadog / Honeycomb. 매 vendor lock-in 의 회피.

🔗 Graph

🤖 LLM 활용

언제: 매 production service 의 instrumentation 설계, OTel migration, 매 cardinality 분석. 언제 X: 매 dev-only script. 매 high cardinality dimension 을 metrics 에 — 매 cost explosion.

안티패턴

  • High cardinality on metrics: 매 user_id 를 매 metric label — 매 storage 폭발.
  • Trace 만 의존: 매 trace 는 매 sampled — 매 absolute count 신뢰 X.
  • Unstructured logs: 매 string concat — 매 query 불가.
  • Vendor SDK lock-in: 매 OTel 대신 매 Datadog SDK 직접 — 매 migration 비용.
  • No sampling: 매 100% trace 전송 — 매 cost / latency 부담.

🧪 검증 / 중복

  • Verified (OpenTelemetry 2.x docs 2026, CNCF observability whitepaper).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — Three Pillars + Profiles + 8 OTel patterns