[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,246 @@
|
||||
---
|
||||
id: devops-otel-collector
|
||||
title: OTel Collector — Pipeline / Sampling / Routing
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [devops, otel, opentelemetry, observability, vibe-coding]
|
||||
tech_stack: { language: "YAML / OTel", applicable_to: ["DevOps"] }
|
||||
applied_in: []
|
||||
aliases: [OpenTelemetry Collector, OTel, receivers, processors, exporters, tail sampling]
|
||||
---
|
||||
|
||||
# OTel Collector
|
||||
|
||||
> Telemetry router: receive → process → export. **앱은 OTLP 만 — Collector 가 backend 갈아끼움**. Tail sampling, attribute scrubbing, multi-export 모두 한 곳.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Receivers: OTLP / Prometheus / Jaeger / Zipkin / Fluent.
|
||||
- Processors: batch / filter / sample / attribute.
|
||||
- Exporters: 어디로 보낼지 (Datadog / Honeycomb / Tempo).
|
||||
- Pipeline: receivers → processors → exporters.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### 기본 config
|
||||
```yaml
|
||||
# otel-config.yaml
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc: { endpoint: 0.0.0.0:4317 }
|
||||
http: { endpoint: 0.0.0.0:4318 }
|
||||
|
||||
processors:
|
||||
batch:
|
||||
timeout: 10s
|
||||
send_batch_size: 1024
|
||||
|
||||
memory_limiter:
|
||||
check_interval: 1s
|
||||
limit_mib: 512
|
||||
spike_limit_mib: 100
|
||||
|
||||
attributes:
|
||||
actions:
|
||||
- key: deployment.environment
|
||||
value: prod
|
||||
action: insert
|
||||
- key: user.email
|
||||
action: delete # PII
|
||||
|
||||
resource:
|
||||
attributes:
|
||||
- key: service.namespace
|
||||
value: acme
|
||||
action: insert
|
||||
|
||||
exporters:
|
||||
otlphttp/honeycomb:
|
||||
endpoint: https://api.honeycomb.io
|
||||
headers: { x-honeycomb-team: $HC_KEY }
|
||||
|
||||
prometheus:
|
||||
endpoint: 0.0.0.0:8889
|
||||
|
||||
debug:
|
||||
verbosity: detailed
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [memory_limiter, attributes, batch]
|
||||
exporters: [otlphttp/honeycomb]
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
processors: [memory_limiter, batch]
|
||||
exporters: [prometheus]
|
||||
```
|
||||
|
||||
### Tail sampling (head 가 아닌 tail)
|
||||
```yaml
|
||||
processors:
|
||||
tail_sampling:
|
||||
decision_wait: 30s # span 끝나길 기다림
|
||||
num_traces: 100000
|
||||
expected_new_traces_per_sec: 10
|
||||
policies:
|
||||
- name: error-traces
|
||||
type: status_code
|
||||
status_code: { status_codes: [ERROR] }
|
||||
|
||||
- name: slow-traces
|
||||
type: latency
|
||||
latency: { threshold_ms: 1000 }
|
||||
|
||||
- name: 1-percent-baseline
|
||||
type: probabilistic
|
||||
probabilistic: { sampling_percentage: 1 }
|
||||
|
||||
- name: high-value-customer
|
||||
type: string_attribute
|
||||
string_attribute:
|
||||
key: user.plan
|
||||
values: [enterprise]
|
||||
```
|
||||
|
||||
→ 에러 / 느림 / VIP 항상 keep, 나머지 1%.
|
||||
|
||||
### Attribute scrubbing (PII)
|
||||
```yaml
|
||||
processors:
|
||||
redaction:
|
||||
allow_all_keys: true
|
||||
blocked_values:
|
||||
- '\d{3}-\d{2}-\d{4}' # SSN
|
||||
- '4[0-9]{12}(?:[0-9]{3})?' # credit card
|
||||
|
||||
attributes:
|
||||
actions:
|
||||
- key: http.request.header.authorization
|
||||
action: delete
|
||||
- key: user.email
|
||||
action: hash # SHA256
|
||||
```
|
||||
|
||||
### Multi-export (split traffic)
|
||||
```yaml
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [otlphttp/datadog, otlphttp/honeycomb]
|
||||
# 둘 다 보냄
|
||||
```
|
||||
|
||||
### Filter (drop)
|
||||
```yaml
|
||||
processors:
|
||||
filter:
|
||||
traces:
|
||||
span:
|
||||
- 'attributes["http.target"] == "/health"'
|
||||
- 'attributes["http.target"] == "/metrics"'
|
||||
```
|
||||
|
||||
→ Health check span 제거 — noise + cost.
|
||||
|
||||
### Routing (조건별 다른 backend)
|
||||
```yaml
|
||||
processors:
|
||||
routing:
|
||||
from_attribute: deployment.environment
|
||||
table:
|
||||
- value: prod
|
||||
exporters: [otlphttp/honeycomb]
|
||||
- value: dev
|
||||
exporters: [debug]
|
||||
```
|
||||
|
||||
### Sidecar pattern (Kubernetes)
|
||||
```yaml
|
||||
# 각 pod 옆 collector
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: myapp
|
||||
env:
|
||||
- name: OTEL_EXPORTER_OTLP_ENDPOINT
|
||||
value: http://localhost:4317
|
||||
- name: otel-collector
|
||||
image: otel/opentelemetry-collector-contrib
|
||||
args: [--config=/etc/otel/config.yaml]
|
||||
```
|
||||
|
||||
### Gateway pattern (cluster-level)
|
||||
```
|
||||
App → sidecar → gateway collector → backend
|
||||
```
|
||||
|
||||
→ 중앙 집계 + 정책 적용.
|
||||
|
||||
### 자체 metrics (collector 자신)
|
||||
```yaml
|
||||
service:
|
||||
telemetry:
|
||||
metrics:
|
||||
address: 0.0.0.0:8888
|
||||
level: detailed
|
||||
```
|
||||
|
||||
→ Prometheus 가 collector 자체 monitoring.
|
||||
|
||||
### Metric (host / process)
|
||||
```yaml
|
||||
receivers:
|
||||
hostmetrics:
|
||||
collection_interval: 30s
|
||||
scrapers:
|
||||
cpu: {}
|
||||
memory: {}
|
||||
disk: {}
|
||||
network: {}
|
||||
```
|
||||
|
||||
### Log (Fluentbit / Loki)
|
||||
```yaml
|
||||
receivers:
|
||||
filelog:
|
||||
include: [/var/log/app/*.log]
|
||||
operators:
|
||||
- type: json_parser
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 환경 | 패턴 |
|
||||
|---|---|
|
||||
| 단일 서비스 | App → Collector → Backend |
|
||||
| K8s 다중 service | Sidecar + Gateway |
|
||||
| Traffic 큼 | Gateway only |
|
||||
| Multi-cloud | Gateway 가 routing |
|
||||
| 비용 절감 | Tail sampling + filter |
|
||||
| Privacy 강 | redaction processor |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **App 이 Datadog / Honeycomb 직접**: vendor lock-in. OTLP + Collector.
|
||||
- **Tail sampling + 작은 buffer**: 의미 있는 trace 잃음. num_traces 충분.
|
||||
- **모든 trace 100%**: 비용 폭발. probabilistic + tail.
|
||||
- **PII redaction 없음**: GDPR 위반.
|
||||
- **Collector 없는 sampling**: SDK 의 head sampling 만 — 에러 trace 잃음.
|
||||
- **Memory_limiter 없음**: OOM.
|
||||
- **Batch 너무 큼 (10K)**: latency.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- App = OTLP 만, Collector 가 라우팅.
|
||||
- Tail sampling = error / slow / VIP 우선.
|
||||
- PII redaction + filter (health) 항상.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DevOps_Observability_Stack]]
|
||||
- [[Native_Crash_Reporting]]
|
||||
- [[Observability_OpenTelemetry]]
|
||||
Reference in New Issue
Block a user