Files
2nd/10_Wiki/Topics/AI_and_ML/V-component (Evaluation Interface).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.2 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-v-component-evaluation-interface V-component (Evaluation Interface) 10_Wiki/Topics verified self
Eval UI Component
V-component
none B 0.85 applied
llm-eval
ui
component
dashboard
observability
2026-05-10 pending
language framework
typescript React-19

V-component (Evaluation Interface)

매 한 줄

"매 LLM eval result 의 매 inspect · compare · annotate 위한 매 reusable UI primitive." Braintrust · Langfuse · Phoenix (Arize) 같은 매 eval platform 의 핵심 building block — 매 trace tree + 매 score panel + 매 diff view 의 통합. 매 custom dashboard 의 build 시 매 in-house V-component 의 생성 이 매 일반 패턴.

매 핵심

매 V-component 의 구성

  • Trace viewer: 매 LLM call chain 의 tree (input → tool calls → output).
  • Score panel: 매 metric (accuracy, faithfulness, latency, cost) 의 numeric + sparkline.
  • Diff view: 매 two run 의 side-by-side comparison.
  • Annotation: 매 human reviewer 의 매 label · comment.
  • Filter / search: 매 trace 의 fail · slow · expensive 만 isolation.

매 data shape

  • Trace: { id, name, input, output, children: Span[], metadata }.
  • Score: { name, value, type: "numeric" | "categorical", confidence }.
  • Annotation: { author, label, comment, ts }.

매 design 결정

  • Virtualization: 매 1000+ trace 의 render — react-virtuoso · TanStack Virtual.
  • Streaming: 매 in-progress trace 의 real-time update — SSE · WebSocket.
  • Diff algorithm: 매 string-level (diff-match-patch) + 매 structural (json-diff).

매 응용

  1. Internal eval dashboard: 매 ML team 의 매 model regression 의 detect.
  2. PR review: 매 prompt change 의 매 before/after diff.
  3. Production monitoring: 매 live trace 의 매 anomaly detection.

💻 패턴

매 Trace tree component (React)

type Span = {
  id: string;
  name: string;
  input: unknown;
  output: unknown;
  durationMs: number;
  children: Span[];
};

function TraceTree({ root }: { root: Span }) {
  return (
    <ul className="font-mono text-sm">
      <SpanNode span={root} depth={0} />
    </ul>
  );
}

function SpanNode({ span, depth }: { span: Span; depth: number }) {
  const [open, setOpen] = useState(depth < 2);
  return (
    <li style={{ paddingLeft: depth * 16 }}>
      <button onClick={() => setOpen(o => !o)}>
        {open ? "▼" : "▶"} {span.name} <span className="text-gray-500">{span.durationMs}ms</span>
      </button>
      {open && (
        <>
          <pre className="text-xs">{JSON.stringify(span.input, null, 2)}</pre>
          {span.children.map(c => <SpanNode key={c.id} span={c} depth={depth + 1} />)}
        </>
      )}
    </li>
  );
}

매 Score panel

type Score = { name: string; value: number; series?: number[] };

function ScorePanel({ scores }: { scores: Score[] }) {
  return (
    <div className="grid grid-cols-3 gap-2">
      {scores.map(s => (
        <div key={s.name} className="rounded border p-2">
          <div className="text-xs text-gray-500">{s.name}</div>
          <div className="text-2xl font-bold">{s.value.toFixed(3)}</div>
          {s.series && <Sparkline data={s.series} />}
        </div>
      ))}
    </div>
  );
}

매 Diff view (two runs)

import { diffLines } from "diff";

function DiffView({ a, b }: { a: string; b: string }) {
  const parts = diffLines(a, b);
  return (
    <pre className="text-xs">
      {parts.map((p, i) => (
        <span key={i} className={
          p.added ? "bg-green-100" : p.removed ? "bg-red-100" : ""
        }>{p.value}</span>
      ))}
    </pre>
  );
}

매 Virtualized trace list (1000+ items)

import { Virtuoso } from "react-virtuoso";

function TraceList({ traces }: { traces: Trace[] }) {
  return (
    <Virtuoso
      data={traces}
      itemContent={(_, t) => (
        <TraceRow trace={t} status={t.scores.faithfulness < 0.7 ? "fail" : "ok"} />
      )}
      style={{ height: "100vh" }}
    />
  );
}

매 Streaming trace (SSE)

function useLiveTraces(runId: string) {
  const [traces, setTraces] = useState<Trace[]>([]);
  useEffect(() => {
    const es = new EventSource(`/api/runs/${runId}/stream`);
    es.onmessage = e => {
      const span: Span = JSON.parse(e.data);
      setTraces(prev => mergeSpan(prev, span));
    };
    return () => es.close();
  }, [runId]);
  return traces;
}

매 Annotation (human-in-the-loop)

function AnnotationPanel({ traceId }: { traceId: string }) {
  const [label, setLabel] = useState<"good" | "bad" | "unsure">();
  const [comment, setComment] = useState("");

  const submit = async () => {
    await fetch(`/api/traces/${traceId}/annotations`, {
      method: "POST",
      body: JSON.stringify({ label, comment, author: currentUser.id }),
    });
  };

  return (
    <div className="space-y-2">
      <RadioGroup value={label} onChange={setLabel} options={["good", "bad", "unsure"]} />
      <textarea value={comment} onChange={e => setComment(e.target.value)} />
      <button onClick={submit}>Save</button>
    </div>
  );
}

매 Filter / query (Braintrust-style)

type FilterExpr = {
  field: "score.faithfulness" | "duration_ms" | "model";
  op: "<" | ">" | "==" | "contains";
  value: number | string;
};

function applyFilters(traces: Trace[], filters: FilterExpr[]) {
  return traces.filter(t => filters.every(f => evalExpr(t, f)));
}

// 매 UI: 매 query builder + 매 saved filter

매 결정 기준

상황 Approach
매 hosted eval platform 가능 매 Braintrust / Langfuse / Phoenix (build X)
매 internal-only, 매 specific domain 매 custom V-component (Tailwind + TanStack)
매 small team 매 hosted — 매 build cost 의 prohibitive
매 1000+ traces / day 매 virtualization 필수

기본값: 매 startup 은 매 Langfuse self-host, 매 enterprise 는 매 Braintrust / Arize.

🔗 Graph

🤖 LLM 활용

언제: 매 V-component 의 boilerplate (trace tree, virtualized list) 의 generation — 매 well-typed React + TanStack 패턴. 언제 X: 매 domain-specific scoring logic — 매 hand-author.

안티패턴

  • 매 No virtualization: 매 5000 trace 의 매 single render — 매 browser freeze.
  • 매 Score 의 raw number 만: 매 sparkline · histogram 부재 — 매 trend 의 invisible.
  • 매 Mixed run units: 매 different prompt versions 의 매 scores 의 average — 매 misleading.
  • 매 No annotation persistence: 매 reviewer label 의 매 lost — 매 future training data 의 source 의 X.

🧪 검증 / 중복

  • Verified (Braintrust docs 2025; Langfuse v3 docs; Arize Phoenix 2024).
  • 신뢰도 B (매 design pattern — 매 standardized spec 부재).

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — trace tree, score panel, diff, streaming 패턴 추가