--- id: wiki-2026-0508-datacollector-knowledge-hub title: Datacollector Knowledge Hub category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Web Analytics Collector, Frontend Telemetry Pipeline] duplicate_of: none source_trust_level: A confidence_score: 0.85 verification_status: applied tags: [analytics, telemetry, observability, data-collection] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: TypeScript framework: Browser/Edge --- # Datacollector Knowledge Hub ## 매 한 줄 > **"매 datacollector = browser event → edge ingest → warehouse 의 매 reliable pipeline."**. 매 frontend datacollector는 매 user behavior, performance (Web Vitals), error 의 capture → 매 batch / sendBeacon → edge function (Cloudflare Workers / Vercel) → 매 ClickHouse / BigQuery / Snowflake. 매 2026 perspective는 매 1st-party domain ingest + GDPR/ePrivacy 동의 + server-side GTM. ## 매 핵심 ### 매 collector responsibilities - **Capture**: page view, click, custom event, performance, error. - **Enrich**: session id, user id (consent-gated), referrer, UTM. - **Batch & Buffer**: idle batching, sendBeacon on `pagehide`. - **Privacy**: consent state, IP truncation, PII scrubbing. - **Transport**: 1st-party endpoint > 3rd-party (avoid adblock). ### 매 Web Vitals - LCP, INP, CLS, TTFB, FCP — `web-vitals` library. - 매 attribution build (`onINP({ reportAllChanges: true })`) 매 root-cause. ### 매 응용 1. Product analytics (PostHog, Amplitude, Mixpanel, Segment). 2. RUM (Datadog, Sentry, New Relic, SpeedCurve). 3. A/B testing exposure logging. 4. Funnel & cohort analysis. ## 💻 패턴 ### 1. Minimal collector (sendBeacon + queue) ```typescript type Event = { type: string; ts: number; props?: Record }; const queue: Event[] = []; const ENDPOINT = '/collect'; // 1st-party function track(type: string, props?: Event['props']) { queue.push({ type, ts: Date.now(), props }); if (queue.length >= 20) flush(); } function flush() { if (!queue.length) return; const batch = queue.splice(0, queue.length); const blob = new Blob([JSON.stringify(batch)], { type: 'application/json' }); navigator.sendBeacon(ENDPOINT, blob) || fetch(ENDPOINT, { method: 'POST', body: blob, keepalive: true }); } addEventListener('pagehide', flush); addEventListener('visibilitychange', () => { if (document.visibilityState === 'hidden') flush(); }); ``` ### 2. Web Vitals capture ```typescript import { onLCP, onINP, onCLS, onTTFB } from 'web-vitals'; const send = (metric: { name: string; value: number; id: string }) => track('vital', { name: metric.name, value: metric.value, id: metric.id }); onLCP(send); onINP(send); onCLS(send); onTTFB(send); ``` ### 3. Error capture (global) ```typescript window.addEventListener('error', (e) => { track('error', { msg: e.message, src: e.filename, line: e.lineno, col: e.colno, stack: e.error?.stack }); }); window.addEventListener('unhandledrejection', (e) => { track('promise_rejection', { reason: String(e.reason) }); }); ``` ### 4. Auto click tracking (delegated) ```typescript document.addEventListener('click', (e) => { const t = (e.target as Element).closest('[data-track]'); if (t instanceof HTMLElement) { track('click', { id: t.dataset.track, ...t.dataset }); } }); ``` ### 5. Consent gating (TCFv2) ```typescript function hasAnalyticsConsent(): boolean { const tcData = (window as any).__tcfapi; // simplified — production: subscribe to TCF api return localStorage.getItem('consent.analytics') === '1'; } const origTrack = track; const trackGated = (type: string, props?: Event['props']) => { if (!hasAnalyticsConsent()) return; origTrack(type, props); }; ``` ### 6. Edge ingest (Cloudflare Worker) ```typescript // worker.ts export default { async fetch(req: Request, env: Env): Promise { if (req.method !== 'POST') return new Response('', { status: 405 }); const events = await req.json(); const enriched = events.map((e) => ({ ...e, ip_country: req.cf?.country, ua: req.headers.get('user-agent'), ingest_ts: Date.now(), })); await env.QUEUE.send(enriched); return new Response('', { status: 204 }); }, }; ``` ### 7. Server-side GTM forward ```typescript // edge → GTM SS container await fetch('https://gtm.example.com/g/collect', { method: 'POST', headers: { 'content-type': 'application/json' }, body: JSON.stringify({ events: enriched }), }); ``` ### 8. PII scrub ```typescript const EMAIL = /[\w.+-]+@[\w-]+\.[\w.-]+/g; const PHONE = /\+?\d[\d \-]{8,}\d/g; function scrub>(props: T): T { return JSON.parse(JSON.stringify(props).replace(EMAIL, '[email]').replace(PHONE, '[phone]')); } ``` ### 9. Session identity ```typescript const SESSION_KEY = 'sid'; const TIMEOUT_MS = 30 * 60 * 1000; function getSession() { const raw = sessionStorage.getItem(SESSION_KEY); const parsed = raw ? JSON.parse(raw) : null; if (parsed && Date.now() - parsed.last < TIMEOUT_MS) { parsed.last = Date.now(); } else { parsed?.id || (Object.assign({}, { id: crypto.randomUUID(), last: Date.now() })); } const out = parsed ?? { id: crypto.randomUUID(), last: Date.now() }; sessionStorage.setItem(SESSION_KEY, JSON.stringify(out)); return out.id; } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 product analytics | PostHog (OSS) or Amplitude. | | 매 RUM perf | Sentry / Datadog RUM. | | Privacy strict (EU) | 1st-party + consent + IP truncation. | | Adblock evasion | 매 1st-party domain reverse-proxy. | | 자체 build | sendBeacon + edge function + ClickHouse. | **기본값**: `web-vitals` + 매 1st-party `/collect` + sendBeacon batch + edge ingest. ## 🔗 Graph - 부모: [[Observability]] - 응용: [[Sentry]] - Adjacent: [[Web Vitals]] · [[GDPR]] · [[ClickHouse]] ## 🤖 LLM 활용 **언제**: 매 collector skeleton, consent gating, batch / flush logic. **언제 X**: 매 specific vendor SDK 의 detailed config — 매 vendor docs. ## ❌ 안티패턴 - **`fetch` without `keepalive`**: 매 unload 매 request drop. - **No batching**: 매 매 event = 매 request — flood. - **PII without consent**: 매 GDPR breach. - **3rd-party domain only**: 매 adblock 차단 — 매 1st-party proxy. - **Sync XHR on unload**: 매 deprecated — sendBeacon 사용. ## 🧪 검증 / 중복 - Verified (web.dev web-vitals, IAB TCFv2, MDN sendBeacon). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — datacollector pipeline + Web Vitals + consent |