Files
2nd/10_Wiki/Topics/Frontend/Datacollector-Knowledge-Hub.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.5 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-datacollector-knowledge-hub Datacollector Knowledge Hub 10_Wiki/Topics verified self
Web Analytics Collector
Frontend Telemetry Pipeline
none A 0.85 applied
analytics
telemetry
observability
data-collection
2026-05-10 pending
language framework
TypeScript Browser/Edge

Datacollector Knowledge Hub

매 한 줄

"매 datacollector = browser event → edge ingest → warehouse 의 매 reliable pipeline.". 매 frontend datacollector는 매 user behavior, performance (Web Vitals), error 의 capture → 매 batch / sendBeacon → edge function (Cloudflare Workers / Vercel) → 매 ClickHouse / BigQuery / Snowflake. 매 2026 perspective는 매 1st-party domain ingest + GDPR/ePrivacy 동의 + server-side GTM.

매 핵심

매 collector responsibilities

  • Capture: page view, click, custom event, performance, error.
  • Enrich: session id, user id (consent-gated), referrer, UTM.
  • Batch & Buffer: idle batching, sendBeacon on pagehide.
  • Privacy: consent state, IP truncation, PII scrubbing.
  • Transport: 1st-party endpoint > 3rd-party (avoid adblock).

매 Web Vitals

  • LCP, INP, CLS, TTFB, FCP — web-vitals library.
  • 매 attribution build (onINP({ reportAllChanges: true })) 매 root-cause.

매 응용

  1. Product analytics (PostHog, Amplitude, Mixpanel, Segment).
  2. RUM (Datadog, Sentry, New Relic, SpeedCurve).
  3. A/B testing exposure logging.
  4. Funnel & cohort analysis.

💻 패턴

1. Minimal collector (sendBeacon + queue)

type Event = { type: string; ts: number; props?: Record<string, unknown> };
const queue: Event[] = [];
const ENDPOINT = '/collect'; // 1st-party

function track(type: string, props?: Event['props']) {
  queue.push({ type, ts: Date.now(), props });
  if (queue.length >= 20) flush();
}

function flush() {
  if (!queue.length) return;
  const batch = queue.splice(0, queue.length);
  const blob = new Blob([JSON.stringify(batch)], { type: 'application/json' });
  navigator.sendBeacon(ENDPOINT, blob) || fetch(ENDPOINT, { method: 'POST', body: blob, keepalive: true });
}

addEventListener('pagehide', flush);
addEventListener('visibilitychange', () => { if (document.visibilityState === 'hidden') flush(); });

2. Web Vitals capture

import { onLCP, onINP, onCLS, onTTFB } from 'web-vitals';
const send = (metric: { name: string; value: number; id: string }) =>
  track('vital', { name: metric.name, value: metric.value, id: metric.id });
onLCP(send); onINP(send); onCLS(send); onTTFB(send);

3. Error capture (global)

window.addEventListener('error', (e) => {
  track('error', { msg: e.message, src: e.filename, line: e.lineno, col: e.colno, stack: e.error?.stack });
});
window.addEventListener('unhandledrejection', (e) => {
  track('promise_rejection', { reason: String(e.reason) });
});

4. Auto click tracking (delegated)

document.addEventListener('click', (e) => {
  const t = (e.target as Element).closest('[data-track]');
  if (t instanceof HTMLElement) {
    track('click', { id: t.dataset.track, ...t.dataset });
  }
});
function hasAnalyticsConsent(): boolean {
  const tcData = (window as any).__tcfapi;
  // simplified — production: subscribe to TCF api
  return localStorage.getItem('consent.analytics') === '1';
}
const origTrack = track;
const trackGated = (type: string, props?: Event['props']) => {
  if (!hasAnalyticsConsent()) return;
  origTrack(type, props);
};

6. Edge ingest (Cloudflare Worker)

// worker.ts
export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    if (req.method !== 'POST') return new Response('', { status: 405 });
    const events = await req.json<Event[]>();
    const enriched = events.map((e) => ({
      ...e,
      ip_country: req.cf?.country,
      ua: req.headers.get('user-agent'),
      ingest_ts: Date.now(),
    }));
    await env.QUEUE.send(enriched);
    return new Response('', { status: 204 });
  },
};

7. Server-side GTM forward

// edge → GTM SS container
await fetch('https://gtm.example.com/g/collect', {
  method: 'POST',
  headers: { 'content-type': 'application/json' },
  body: JSON.stringify({ events: enriched }),
});

8. PII scrub

const EMAIL = /[\w.+-]+@[\w-]+\.[\w.-]+/g;
const PHONE = /\+?\d[\d \-]{8,}\d/g;
function scrub<T extends Record<string, unknown>>(props: T): T {
  return JSON.parse(JSON.stringify(props).replace(EMAIL, '[email]').replace(PHONE, '[phone]'));
}

9. Session identity

const SESSION_KEY = 'sid';
const TIMEOUT_MS = 30 * 60 * 1000;
function getSession() {
  const raw = sessionStorage.getItem(SESSION_KEY);
  const parsed = raw ? JSON.parse(raw) : null;
  if (parsed && Date.now() - parsed.last < TIMEOUT_MS) {
    parsed.last = Date.now();
  } else {
    parsed?.id || (Object.assign({}, { id: crypto.randomUUID(), last: Date.now() }));
  }
  const out = parsed ?? { id: crypto.randomUUID(), last: Date.now() };
  sessionStorage.setItem(SESSION_KEY, JSON.stringify(out));
  return out.id;
}

매 결정 기준

상황 Approach
매 product analytics PostHog (OSS) or Amplitude.
매 RUM perf Sentry / Datadog RUM.
Privacy strict (EU) 1st-party + consent + IP truncation.
Adblock evasion 매 1st-party domain reverse-proxy.
자체 build sendBeacon + edge function + ClickHouse.

기본값: web-vitals + 매 1st-party /collect + sendBeacon batch + edge ingest.

🔗 Graph

🤖 LLM 활용

언제: 매 collector skeleton, consent gating, batch / flush logic. 언제 X: 매 specific vendor SDK 의 detailed config — 매 vendor docs.

안티패턴

  • fetch without keepalive: 매 unload 매 request drop.
  • No batching: 매 매 event = 매 request — flood.
  • PII without consent: 매 GDPR breach.
  • 3rd-party domain only: 매 adblock 차단 — 매 1st-party proxy.
  • Sync XHR on unload: 매 deprecated — sendBeacon 사용.

🧪 검증 / 중복

  • Verified (web.dev web-vitals, IAB TCFv2, MDN sendBeacon).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — datacollector pipeline + Web Vitals + consent