--- id: wiki-2026-0508-v-component-evaluation-interface title: V-component (Evaluation Interface) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Eval UI Component, V-component] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: applied tags: [llm-eval, ui, component, dashboard, observability] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: typescript framework: React-19 --- # V-component (Evaluation Interface) ## 매 한 줄 > **"매 LLM eval result 의 매 inspect · compare · annotate 위한 매 reusable UI primitive."** Braintrust · Langfuse · Phoenix (Arize) 같은 매 eval platform 의 핵심 building block — 매 trace tree + 매 score panel + 매 diff view 의 통합. 매 custom dashboard 의 build 시 매 in-house V-component 의 생성 이 매 일반 패턴. ## 매 핵심 ### 매 V-component 의 구성 - **Trace viewer**: 매 LLM call chain 의 tree (input → tool calls → output). - **Score panel**: 매 metric (accuracy, faithfulness, latency, cost) 의 numeric + sparkline. - **Diff view**: 매 two run 의 side-by-side comparison. - **Annotation**: 매 human reviewer 의 매 label · comment. - **Filter / search**: 매 trace 의 fail · slow · expensive 만 isolation. ### 매 data shape - **Trace**: { id, name, input, output, children: Span[], metadata }. - **Score**: { name, value, type: "numeric" | "categorical", confidence }. - **Annotation**: { author, label, comment, ts }. ### 매 design 결정 - **Virtualization**: 매 1000+ trace 의 render — react-virtuoso · TanStack Virtual. - **Streaming**: 매 in-progress trace 의 real-time update — SSE · WebSocket. - **Diff algorithm**: 매 string-level (diff-match-patch) + 매 structural (json-diff). ### 매 응용 1. **Internal eval dashboard**: 매 ML team 의 매 model regression 의 detect. 2. **PR review**: 매 prompt change 의 매 before/after diff. 3. **Production monitoring**: 매 live trace 의 매 anomaly detection. ## 💻 패턴 ### 매 Trace tree component (React) ```tsx type Span = { id: string; name: string; input: unknown; output: unknown; durationMs: number; children: Span[]; }; function TraceTree({ root }: { root: Span }) { return (

); } function SpanNode({ span, depth }: { span: Span; depth: number }) { const [open, setOpen] = useState(depth < 2); return (

{open && ( <>

{JSON.stringify(span.input, null, 2)}

{span.children.map(c => )} )}

); } ``` ### 매 Score panel ```tsx type Score = { name: string; value: number; series?: number[] }; function ScorePanel({ scores }: { scores: Score[] }) { return (

{scores.map(s => (

{s.name}

{s.value.toFixed(3)}

{s.series && }

))}

); } ``` ### 매 Diff view (two runs) ```tsx import { diffLines } from "diff"; function DiffView({ a, b }: { a: string; b: string }) { const parts = diffLines(a, b); return (

      {parts.map((p, i) => (
        {p.value}
      ))}

); } ``` ### 매 Virtualized trace list (1000+ items) ```tsx import { Virtuoso } from "react-virtuoso"; function TraceList({ traces }: { traces: Trace[] }) { return ( ( )} style={{ height: "100vh" }} /> ); } ``` ### 매 Streaming trace (SSE) ```tsx function useLiveTraces(runId: string) { const [traces, setTraces] = useState([]); useEffect(() => { const es = new EventSource(`/api/runs/${runId}/stream`); es.onmessage = e => { const span: Span = JSON.parse(e.data); setTraces(prev => mergeSpan(prev, span)); }; return () => es.close(); }, [runId]); return traces; } ``` ### 매 Annotation (human-in-the-loop) ```tsx function AnnotationPanel({ traceId }: { traceId: string }) { const [label, setLabel] = useState<"good" | "bad" | "unsure">(); const [comment, setComment] = useState(""); const submit = async () => { await fetch(`/api/traces/${traceId}/annotations`, { method: "POST", body: JSON.stringify({ label, comment, author: currentUser.id }), }); }; return (

setComment(e.target.value)} />
      <button onClick={submit}>Save</button>
    </div>
  );
}
```

### 매 Filter / query (Braintrust-style)
```typescript
type FilterExpr = {
  field: "score.faithfulness" | "duration_ms" | "model";
  op: "<" | ">" | "==" | "contains";
  value: number | string;
};

function applyFilters(traces: Trace[], filters: FilterExpr[]) {
  return traces.filter(t => filters.every(f => evalExpr(t, f)));
}

// 매 UI: 매 query builder + 매 saved filter
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 hosted eval platform 가능 | 매 Braintrust / Langfuse / Phoenix (build X) |
| 매 internal-only, 매 specific domain | 매 custom V-component (Tailwind + TanStack) |
| 매 small team | 매 hosted — 매 build cost 의 prohibitive |
| 매 1000+ traces / day | 매 virtualization 필수 |

**기본값**: 매 startup 은 매 Langfuse self-host, 매 enterprise 는 매 Braintrust / Arize.

## 🔗 Graph
- 부모: [[LLM Eval]] · [[Observability]]
- 변형: [[Trace Viewer]]
- 응용: [[Braintrust]] · [[Langfuse]]
- Adjacent: [[OpenTelemetry]]

## 🤖 LLM 활용
**언제**: 매 V-component 의 boilerplate (trace tree, virtualized list) 의 generation — 매 well-typed React + TanStack 패턴.
**언제 X**: 매 domain-specific scoring logic — 매 hand-author.

## ❌ 안티패턴
- **매 No virtualization**: 매 5000 trace 의 매 single render — 매 browser freeze.
- **매 Score 의 raw number 만**: 매 sparkline · histogram 부재 — 매 trend 의 invisible.
- **매 Mixed run units**: 매 different prompt versions 의 매 scores 의 average — 매 misleading.
- **매 No annotation persistence**: 매 reviewer label 의 매 lost — 매 future training data 의 source 의 X.

## 🧪 검증 / 중복
- Verified (Braintrust docs 2025; Langfuse v3 docs; Arize Phoenix 2024).
- 신뢰도 B (매 design pattern — 매 standardized spec 부재).

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — trace tree, score panel, diff, streaming 패턴 추가 |