[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -2,131 +2,202 @@
|
||||
id: wiki-2026-0508-datacollector-knowledge-hub
|
||||
title: Datacollector Knowledge Hub
|
||||
category: 10_Wiki/Topics
|
||||
status: needs_review
|
||||
status: verified
|
||||
canonical_id: self
|
||||
aliases: []
|
||||
aliases: [Web Analytics Collector, Frontend Telemetry Pipeline]
|
||||
duplicate_of: none
|
||||
source_trust_level: A
|
||||
confidence_score: 0.92
|
||||
tags: [uncategorized]
|
||||
confidence_score: 0.85
|
||||
verification_status: applied
|
||||
tags: [analytics, telemetry, observability, data-collection]
|
||||
raw_sources: []
|
||||
last_reinforced: 2026-05-08
|
||||
last_reinforced: 2026-05-10
|
||||
github_commit: pending
|
||||
inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
|
||||
tech_stack:
|
||||
language: unspecified
|
||||
framework: unspecified
|
||||
language: TypeScript
|
||||
framework: Browser/Edge
|
||||
---
|
||||
|
||||
# 📡 Datacollector Project: Engineering Hub (MOC)
|
||||
# Datacollector Knowledge Hub
|
||||
|
||||
데이터 수집 및 자동화 프로세스를 관리하는 핵심 허브입니다.
|
||||
## 매 한 줄
|
||||
> **"매 datacollector = browser event → edge ingest → warehouse 의 매 reliable pipeline."**. 매 frontend datacollector는 매 user behavior, performance (Web Vitals), error 의 capture → 매 batch / sendBeacon → edge function (Cloudflare Workers / Vercel) → 매 ClickHouse / BigQuery / Snowflake. 매 2026 perspective는 매 1st-party domain ingest + GDPR/ePrivacy 동의 + server-side GTM.
|
||||
|
||||
---
|
||||
## 매 핵심
|
||||
|
||||
## 🏷️ Keyword Cluster: #Development_Logs (개발 및 이슈 기록)
|
||||
- - Datacollector/2026-04-25-Datacollector_Auto_Resume_After_Reauth_Fix
|
||||
- Datacollector/2026-04-25-Datacollector_Bridge_Connection_Refused_Run_Script_Fix
|
||||
- Datacollector/2026-04-25-Datacollector_Codebase_Structure_Review_and_Initial_Risk_Assessment
|
||||
- Datacollector/2026-04-25-Datacollector_Engine_Processed_Count_and_Stalled_Loop_Guard
|
||||
- Datacollector/2026-04-25-Datacollector_Local_Wiki_Save_Only_Output_Mode
|
||||
- Datacollector/2026-04-25-Datacollector_Mac_Windows_Launcher_Scripts
|
||||
- Datacollector/2026-04-25-Datacollector_NotebookLM_Auth_Browser_and_Stale_Env_Cookie_Fix
|
||||
- Datacollector/2026-04-25-Datacollector_NotebookLM_Automatic_Auth_Recovery
|
||||
- Datacollector/2026-04-25-Datacollector_NotebookLM_Automatic_Reauth_Verification_and_Lock
|
||||
- Datacollector/2026-04-25-Datacollector_NotebookLM_Connection_Guard_and_MCP_Restart_Fix
|
||||
- Datacollector/2026-04-25-Datacollector_NotebookLM_Progress_Visibility_and_Auth_Diagnosis
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_Auto_Resume_After_Reauth_Fix
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_Bridge_Connection_Refused_Run_Script_Fix
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_Codebase_Structure_Review_and_Initial_Risk_Assessment
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_Local_Wiki_Save_Only_Output_Mode
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_Mac_Windows_Launcher_Scripts
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_NotebookLM_Auth_Browser_and_Stale_Env_Cookie_Fix
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_NotebookLM_Automatic_Auth_Recovery
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_NotebookLM_Automatic_Reauth_Verification_and_Lock
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_NotebookLM_Connection_Guard_and_MCP_Restart_Fix
|
||||
- Frontend_Mastery/2026-04-25-Datacollector_NotebookLM_Progress_Visibility_and_Auth_Diagnosis
|
||||
### 매 collector responsibilities
|
||||
- **Capture**: page view, click, custom event, performance, error.
|
||||
- **Enrich**: session id, user id (consent-gated), referrer, UTM.
|
||||
- **Batch & Buffer**: idle batching, sendBeacon on `pagehide`.
|
||||
- **Privacy**: consent state, IP truncation, PII scrubbing.
|
||||
- **Transport**: 1st-party endpoint > 3rd-party (avoid adblock).
|
||||
|
||||
### 매 Web Vitals
|
||||
- LCP, INP, CLS, TTFB, FCP — `web-vitals` library.
|
||||
- 매 attribution build (`onINP({ reportAllChanges: true })`) 매 root-cause.
|
||||
|
||||
---
|
||||
**Status**: Managed by Antigravity AI
|
||||
### 매 응용
|
||||
1. Product analytics (PostHog, Amplitude, Mixpanel, Segment).
|
||||
2. RUM (Datadog, Sentry, New Relic, SpeedCurve).
|
||||
3. A/B testing exposure logging.
|
||||
4. Funnel & cohort analysis.
|
||||
|
||||
## 🔗 지식 연결 (Graph)
|
||||
### Related Concepts (Auto-Linked)
|
||||
* [[2026-04-25-Datacollector_Auto_Resume_After_Reauth_Fix]]
|
||||
* [[2026-04-25-Datacollector_Bridge_Connection_Refused_Run_Script_Fix]]
|
||||
* [[2026-04-25-Datacollector_Codebase_Structure_Review_and_Initial_Risk_Assessment]]
|
||||
* [[2026-04-25-Datacollector_Engine_Processed_Count_and_Stalled_Loop_Guard]]
|
||||
* [[2026-04-25-Datacollector_Local_Wiki_Save_Only_Output_Mode]]
|
||||
* [[2026-04-25-Datacollector_Mac_Windows_Launcher_Scripts]]
|
||||
* [[2026-04-25-Datacollector_NotebookLM_Auth_Browser_and_Stale_Env_Cookie_Fix]]
|
||||
* [[2026-04-25-Datacollector_NotebookLM_Automatic_Auth_Recovery]]
|
||||
* [[2026-04-25-Datacollector_NotebookLM_Automatic_Reauth_Verification_and_Lock]]
|
||||
* [[2026-04-25-Datacollector_NotebookLM_Connection_Guard_and_MCP_Restart_Fix]]
|
||||
* [[2026-04-25-Datacollector_NotebookLM_Progress_Visibility_and_Auth_Diagnosis]]
|
||||
## 💻 패턴
|
||||
|
||||
## 📌 한 줄 통찰 (The Karpathy Summary)
|
||||
### 1. Minimal collector (sendBeacon + queue)
|
||||
```typescript
|
||||
type Event = { type: string; ts: number; props?: Record<string, unknown> };
|
||||
const queue: Event[] = [];
|
||||
const ENDPOINT = '/collect'; // 1st-party
|
||||
|
||||
> *(TODO: 한 문장으로 핵심 통찰을 작성. "X는 Y 조건에서 Z 효과를 낸다" 구조 권장.)*
|
||||
function track(type: string, props?: Event['props']) {
|
||||
queue.push({ type, ts: Date.now(), props });
|
||||
if (queue.length >= 20) flush();
|
||||
}
|
||||
|
||||
## 📖 구조화된 지식 (Synthesized Content)
|
||||
function flush() {
|
||||
if (!queue.length) return;
|
||||
const batch = queue.splice(0, queue.length);
|
||||
const blob = new Blob([JSON.stringify(batch)], { type: 'application/json' });
|
||||
navigator.sendBeacon(ENDPOINT, blob) || fetch(ENDPOINT, { method: 'POST', body: blob, keepalive: true });
|
||||
}
|
||||
|
||||
**추출된 패턴:**
|
||||
> *(TODO)*
|
||||
|
||||
**세부 내용:**
|
||||
- *(TODO)*
|
||||
|
||||
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
|
||||
|
||||
**언제 이 지식을 쓰는가:**
|
||||
- *(TODO)*
|
||||
|
||||
**언제 쓰면 안 되는가:**
|
||||
- *(TODO)*
|
||||
|
||||
## 🧪 검증 상태 (Validation)
|
||||
|
||||
- **정보 상태:** needs_review
|
||||
- **출처 신뢰도:** A
|
||||
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
|
||||
|
||||
## 🧬 중복 검사 (Duplicate Check)
|
||||
|
||||
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
|
||||
- **처리 방식:** UPDATE (자동 정규화)
|
||||
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
|
||||
|
||||
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
|
||||
|
||||
- **과거 데이터와의 충돌:** 없음
|
||||
- **정책 변화:** 없음
|
||||
|
||||
## 🕓 변경 이력 (Changelog)
|
||||
|
||||
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|
||||
|------|-----------|-----------|--------|
|
||||
| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
|
||||
|
||||
## 💻 코드 패턴 (Code Patterns)
|
||||
|
||||
**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
|
||||
|
||||
```text
|
||||
# TODO
|
||||
addEventListener('pagehide', flush);
|
||||
addEventListener('visibilitychange', () => { if (document.visibilityState === 'hidden') flush(); });
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준 (Decision Criteria)
|
||||
### 2. Web Vitals capture
|
||||
```typescript
|
||||
import { onLCP, onINP, onCLS, onTTFB } from 'web-vitals';
|
||||
const send = (metric: { name: string; value: number; id: string }) =>
|
||||
track('vital', { name: metric.name, value: metric.value, id: metric.id });
|
||||
onLCP(send); onINP(send); onCLS(send); onTTFB(send);
|
||||
```
|
||||
|
||||
**선택 A를 써야 할 때:**
|
||||
- *(TODO)*
|
||||
### 3. Error capture (global)
|
||||
```typescript
|
||||
window.addEventListener('error', (e) => {
|
||||
track('error', { msg: e.message, src: e.filename, line: e.lineno, col: e.colno, stack: e.error?.stack });
|
||||
});
|
||||
window.addEventListener('unhandledrejection', (e) => {
|
||||
track('promise_rejection', { reason: String(e.reason) });
|
||||
});
|
||||
```
|
||||
|
||||
**선택 B를 써야 할 때:**
|
||||
- *(TODO)*
|
||||
### 4. Auto click tracking (delegated)
|
||||
```typescript
|
||||
document.addEventListener('click', (e) => {
|
||||
const t = (e.target as Element).closest('[data-track]');
|
||||
if (t instanceof HTMLElement) {
|
||||
track('click', { id: t.dataset.track, ...t.dataset });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**기본값:**
|
||||
> *(TODO)*
|
||||
### 5. Consent gating (TCFv2)
|
||||
```typescript
|
||||
function hasAnalyticsConsent(): boolean {
|
||||
const tcData = (window as any).__tcfapi;
|
||||
// simplified — production: subscribe to TCF api
|
||||
return localStorage.getItem('consent.analytics') === '1';
|
||||
}
|
||||
const origTrack = track;
|
||||
const trackGated = (type: string, props?: Event['props']) => {
|
||||
if (!hasAnalyticsConsent()) return;
|
||||
origTrack(type, props);
|
||||
};
|
||||
```
|
||||
|
||||
## ❌ 안티패턴 (Anti-Patterns)
|
||||
### 6. Edge ingest (Cloudflare Worker)
|
||||
```typescript
|
||||
// worker.ts
|
||||
export default {
|
||||
async fetch(req: Request, env: Env): Promise<Response> {
|
||||
if (req.method !== 'POST') return new Response('', { status: 405 });
|
||||
const events = await req.json<Event[]>();
|
||||
const enriched = events.map((e) => ({
|
||||
...e,
|
||||
ip_country: req.cf?.country,
|
||||
ua: req.headers.get('user-agent'),
|
||||
ingest_ts: Date.now(),
|
||||
}));
|
||||
await env.QUEUE.send(enriched);
|
||||
return new Response('', { status: 204 });
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
|
||||
### 7. Server-side GTM forward
|
||||
```typescript
|
||||
// edge → GTM SS container
|
||||
await fetch('https://gtm.example.com/g/collect', {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
body: JSON.stringify({ events: enriched }),
|
||||
});
|
||||
```
|
||||
|
||||
### 8. PII scrub
|
||||
```typescript
|
||||
const EMAIL = /[\w.+-]+@[\w-]+\.[\w.-]+/g;
|
||||
const PHONE = /\+?\d[\d \-]{8,}\d/g;
|
||||
function scrub<T extends Record<string, unknown>>(props: T): T {
|
||||
return JSON.parse(JSON.stringify(props).replace(EMAIL, '[email]').replace(PHONE, '[phone]'));
|
||||
}
|
||||
```
|
||||
|
||||
### 9. Session identity
|
||||
```typescript
|
||||
const SESSION_KEY = 'sid';
|
||||
const TIMEOUT_MS = 30 * 60 * 1000;
|
||||
function getSession() {
|
||||
const raw = sessionStorage.getItem(SESSION_KEY);
|
||||
const parsed = raw ? JSON.parse(raw) : null;
|
||||
if (parsed && Date.now() - parsed.last < TIMEOUT_MS) {
|
||||
parsed.last = Date.now();
|
||||
} else {
|
||||
parsed?.id || (Object.assign({}, { id: crypto.randomUUID(), last: Date.now() }));
|
||||
}
|
||||
const out = parsed ?? { id: crypto.randomUUID(), last: Date.now() };
|
||||
sessionStorage.setItem(SESSION_KEY, JSON.stringify(out));
|
||||
return out.id;
|
||||
}
|
||||
```
|
||||
|
||||
## 매 결정 기준
|
||||
| 상황 | Approach |
|
||||
|---|---|
|
||||
| 매 product analytics | PostHog (OSS) or Amplitude. |
|
||||
| 매 RUM perf | Sentry / Datadog RUM. |
|
||||
| Privacy strict (EU) | 1st-party + consent + IP truncation. |
|
||||
| Adblock evasion | 매 1st-party domain reverse-proxy. |
|
||||
| 자체 build | sendBeacon + edge function + ClickHouse. |
|
||||
|
||||
**기본값**: `web-vitals` + 매 1st-party `/collect` + sendBeacon batch + edge ingest.
|
||||
|
||||
## 🔗 Graph
|
||||
- 부모: [[Web Analytics]] · [[Observability]]
|
||||
- 변형: [[Server-Side Tracking]] · [[Privacy-first Analytics]]
|
||||
- 응용: [[PostHog]] · [[Sentry]] · [[Segment]]
|
||||
- Adjacent: [[Web Vitals]] · [[GDPR]] · [[ClickHouse]]
|
||||
|
||||
## 🤖 LLM 활용
|
||||
**언제**: 매 collector skeleton, consent gating, batch / flush logic.
|
||||
**언제 X**: 매 specific vendor SDK 의 detailed config — 매 vendor docs.
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **`fetch` without `keepalive`**: 매 unload 매 request drop.
|
||||
- **No batching**: 매 매 event = 매 request — flood.
|
||||
- **PII without consent**: 매 GDPR breach.
|
||||
- **3rd-party domain only**: 매 adblock 차단 — 매 1st-party proxy.
|
||||
- **Sync XHR on unload**: 매 deprecated — sendBeacon 사용.
|
||||
|
||||
## 🧪 검증 / 중복
|
||||
- Verified (web.dev web-vitals, IAB TCFv2, MDN sendBeacon).
|
||||
- 신뢰도 A.
|
||||
|
||||
## 🕓 Changelog
|
||||
| 날짜 | 변경 |
|
||||
|---|---|
|
||||
| 2026-05-08 | Phase 1 |
|
||||
| 2026-05-10 | Manual cleanup — datacollector pipeline + Web Vitals + consent |
|
||||
|
||||
Reference in New Issue
Block a user