[G1-Sync] Manual knowledge update

This commit is contained in:
Antigravity Agent
2026-05-09 21:08:02 +09:00
parent f0befc887a
commit 93ec7e9056
363 changed files with 68333 additions and 64 deletions
@@ -0,0 +1,246 @@
---
id: devops-otel-collector
title: OTel Collector — Pipeline / Sampling / Routing
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [devops, otel, opentelemetry, observability, vibe-coding]
tech_stack: { language: "YAML / OTel", applicable_to: ["DevOps"] }
applied_in: []
aliases: [OpenTelemetry Collector, OTel, receivers, processors, exporters, tail sampling]
---
# OTel Collector
> Telemetry router: receive → process → export. **앱은 OTLP 만 — Collector 가 backend 갈아끼움**. Tail sampling, attribute scrubbing, multi-export 모두 한 곳.
## 📖 핵심 개념
- Receivers: OTLP / Prometheus / Jaeger / Zipkin / Fluent.
- Processors: batch / filter / sample / attribute.
- Exporters: 어디로 보낼지 (Datadog / Honeycomb / Tempo).
- Pipeline: receivers → processors → exporters.
## 💻 코드 패턴
### 기본 config
```yaml
# otel-config.yaml
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 100
attributes:
actions:
- key: deployment.environment
value: prod
action: insert
- key: user.email
action: delete # PII
resource:
attributes:
- key: service.namespace
value: acme
action: insert
exporters:
otlphttp/honeycomb:
endpoint: https://api.honeycomb.io
headers: { x-honeycomb-team: $HC_KEY }
prometheus:
endpoint: 0.0.0.0:8889
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, attributes, batch]
exporters: [otlphttp/honeycomb]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
```
### Tail sampling (head 가 아닌 tail)
```yaml
processors:
tail_sampling:
decision_wait: 30s # span 끝나길 기다림
num_traces: 100000
expected_new_traces_per_sec: 10
policies:
- name: error-traces
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-traces
type: latency
latency: { threshold_ms: 1000 }
- name: 1-percent-baseline
type: probabilistic
probabilistic: { sampling_percentage: 1 }
- name: high-value-customer
type: string_attribute
string_attribute:
key: user.plan
values: [enterprise]
```
→ 에러 / 느림 / VIP 항상 keep, 나머지 1%.
### Attribute scrubbing (PII)
```yaml
processors:
redaction:
allow_all_keys: true
blocked_values:
- '\d{3}-\d{2}-\d{4}' # SSN
- '4[0-9]{12}(?:[0-9]{3})?' # credit card
attributes:
actions:
- key: http.request.header.authorization
action: delete
- key: user.email
action: hash # SHA256
```
### Multi-export (split traffic)
```yaml
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/datadog, otlphttp/honeycomb]
# 둘 다 보냄
```
### Filter (drop)
```yaml
processors:
filter:
traces:
span:
- 'attributes["http.target"] == "/health"'
- 'attributes["http.target"] == "/metrics"'
```
→ Health check span 제거 — noise + cost.
### Routing (조건별 다른 backend)
```yaml
processors:
routing:
from_attribute: deployment.environment
table:
- value: prod
exporters: [otlphttp/honeycomb]
- value: dev
exporters: [debug]
```
### Sidecar pattern (Kubernetes)
```yaml
# 각 pod 옆 collector
spec:
containers:
- name: app
image: myapp
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://localhost:4317
- name: otel-collector
image: otel/opentelemetry-collector-contrib
args: [--config=/etc/otel/config.yaml]
```
### Gateway pattern (cluster-level)
```
App → sidecar → gateway collector → backend
```
→ 중앙 집계 + 정책 적용.
### 자체 metrics (collector 자신)
```yaml
service:
telemetry:
metrics:
address: 0.0.0.0:8888
level: detailed
```
→ Prometheus 가 collector 자체 monitoring.
### Metric (host / process)
```yaml
receivers:
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
memory: {}
disk: {}
network: {}
```
### Log (Fluentbit / Loki)
```yaml
receivers:
filelog:
include: [/var/log/app/*.log]
operators:
- type: json_parser
```
## 🤔 의사결정 기준
| 환경 | 패턴 |
|---|---|
| 단일 서비스 | App → Collector → Backend |
| K8s 다중 service | Sidecar + Gateway |
| Traffic 큼 | Gateway only |
| Multi-cloud | Gateway 가 routing |
| 비용 절감 | Tail sampling + filter |
| Privacy 강 | redaction processor |
## ❌ 안티패턴
- **App 이 Datadog / Honeycomb 직접**: vendor lock-in. OTLP + Collector.
- **Tail sampling + 작은 buffer**: 의미 있는 trace 잃음. num_traces 충분.
- **모든 trace 100%**: 비용 폭발. probabilistic + tail.
- **PII redaction 없음**: GDPR 위반.
- **Collector 없는 sampling**: SDK 의 head sampling 만 — 에러 trace 잃음.
- **Memory_limiter 없음**: OOM.
- **Batch 너무 큼 (10K)**: latency.
## 🤖 LLM 활용 힌트
- App = OTLP 만, Collector 가 라우팅.
- Tail sampling = error / slow / VIP 우선.
- PII redaction + filter (health) 항상.
## 🔗 관련 문서
- [[DevOps_Observability_Stack]]
- [[Native_Crash_Reporting]]
- [[Observability_OpenTelemetry]]