f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
232 lines
7.2 KiB
Markdown
232 lines
7.2 KiB
Markdown
---
|
|
id: wiki-2026-0508-v-component-evaluation-interface
|
|
title: V-component (Evaluation Interface)
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Eval UI Component, V-component]
|
|
duplicate_of: none
|
|
source_trust_level: B
|
|
confidence_score: 0.85
|
|
verification_status: applied
|
|
tags: [llm-eval, ui, component, dashboard, observability]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: typescript
|
|
framework: React-19
|
|
---
|
|
|
|
# V-component (Evaluation Interface)
|
|
|
|
## 매 한 줄
|
|
> **"매 LLM eval result 의 매 inspect · compare · annotate 위한 매 reusable UI primitive."** Braintrust · Langfuse · Phoenix (Arize) 같은 매 eval platform 의 핵심 building block — 매 trace tree + 매 score panel + 매 diff view 의 통합. 매 custom dashboard 의 build 시 매 in-house V-component 의 생성 이 매 일반 패턴.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 V-component 의 구성
|
|
- **Trace viewer**: 매 LLM call chain 의 tree (input → tool calls → output).
|
|
- **Score panel**: 매 metric (accuracy, faithfulness, latency, cost) 의 numeric + sparkline.
|
|
- **Diff view**: 매 two run 의 side-by-side comparison.
|
|
- **Annotation**: 매 human reviewer 의 매 label · comment.
|
|
- **Filter / search**: 매 trace 의 fail · slow · expensive 만 isolation.
|
|
|
|
### 매 data shape
|
|
- **Trace**: { id, name, input, output, children: Span[], metadata }.
|
|
- **Score**: { name, value, type: "numeric" | "categorical", confidence }.
|
|
- **Annotation**: { author, label, comment, ts }.
|
|
|
|
### 매 design 결정
|
|
- **Virtualization**: 매 1000+ trace 의 render — react-virtuoso · TanStack Virtual.
|
|
- **Streaming**: 매 in-progress trace 의 real-time update — SSE · WebSocket.
|
|
- **Diff algorithm**: 매 string-level (diff-match-patch) + 매 structural (json-diff).
|
|
|
|
### 매 응용
|
|
1. **Internal eval dashboard**: 매 ML team 의 매 model regression 의 detect.
|
|
2. **PR review**: 매 prompt change 의 매 before/after diff.
|
|
3. **Production monitoring**: 매 live trace 의 매 anomaly detection.
|
|
|
|
## 💻 패턴
|
|
|
|
### 매 Trace tree component (React)
|
|
```tsx
|
|
type Span = {
|
|
id: string;
|
|
name: string;
|
|
input: unknown;
|
|
output: unknown;
|
|
durationMs: number;
|
|
children: Span[];
|
|
};
|
|
|
|
function TraceTree({ root }: { root: Span }) {
|
|
return (
|
|
<ul className="font-mono text-sm">
|
|
<SpanNode span={root} depth={0} />
|
|
</ul>
|
|
);
|
|
}
|
|
|
|
function SpanNode({ span, depth }: { span: Span; depth: number }) {
|
|
const [open, setOpen] = useState(depth < 2);
|
|
return (
|
|
<li style={{ paddingLeft: depth * 16 }}>
|
|
<button onClick={() => setOpen(o => !o)}>
|
|
{open ? "▼" : "▶"} {span.name} <span className="text-gray-500">{span.durationMs}ms</span>
|
|
</button>
|
|
{open && (
|
|
<>
|
|
<pre className="text-xs">{JSON.stringify(span.input, null, 2)}</pre>
|
|
{span.children.map(c => <SpanNode key={c.id} span={c} depth={depth + 1} />)}
|
|
</>
|
|
)}
|
|
</li>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 매 Score panel
|
|
```tsx
|
|
type Score = { name: string; value: number; series?: number[] };
|
|
|
|
function ScorePanel({ scores }: { scores: Score[] }) {
|
|
return (
|
|
<div className="grid grid-cols-3 gap-2">
|
|
{scores.map(s => (
|
|
<div key={s.name} className="rounded border p-2">
|
|
<div className="text-xs text-gray-500">{s.name}</div>
|
|
<div className="text-2xl font-bold">{s.value.toFixed(3)}</div>
|
|
{s.series && <Sparkline data={s.series} />}
|
|
</div>
|
|
))}
|
|
</div>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 매 Diff view (two runs)
|
|
```tsx
|
|
import { diffLines } from "diff";
|
|
|
|
function DiffView({ a, b }: { a: string; b: string }) {
|
|
const parts = diffLines(a, b);
|
|
return (
|
|
<pre className="text-xs">
|
|
{parts.map((p, i) => (
|
|
<span key={i} className={
|
|
p.added ? "bg-green-100" : p.removed ? "bg-red-100" : ""
|
|
}>{p.value}</span>
|
|
))}
|
|
</pre>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 매 Virtualized trace list (1000+ items)
|
|
```tsx
|
|
import { Virtuoso } from "react-virtuoso";
|
|
|
|
function TraceList({ traces }: { traces: Trace[] }) {
|
|
return (
|
|
<Virtuoso
|
|
data={traces}
|
|
itemContent={(_, t) => (
|
|
<TraceRow trace={t} status={t.scores.faithfulness < 0.7 ? "fail" : "ok"} />
|
|
)}
|
|
style={{ height: "100vh" }}
|
|
/>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 매 Streaming trace (SSE)
|
|
```tsx
|
|
function useLiveTraces(runId: string) {
|
|
const [traces, setTraces] = useState<Trace[]>([]);
|
|
useEffect(() => {
|
|
const es = new EventSource(`/api/runs/${runId}/stream`);
|
|
es.onmessage = e => {
|
|
const span: Span = JSON.parse(e.data);
|
|
setTraces(prev => mergeSpan(prev, span));
|
|
};
|
|
return () => es.close();
|
|
}, [runId]);
|
|
return traces;
|
|
}
|
|
```
|
|
|
|
### 매 Annotation (human-in-the-loop)
|
|
```tsx
|
|
function AnnotationPanel({ traceId }: { traceId: string }) {
|
|
const [label, setLabel] = useState<"good" | "bad" | "unsure">();
|
|
const [comment, setComment] = useState("");
|
|
|
|
const submit = async () => {
|
|
await fetch(`/api/traces/${traceId}/annotations`, {
|
|
method: "POST",
|
|
body: JSON.stringify({ label, comment, author: currentUser.id }),
|
|
});
|
|
};
|
|
|
|
return (
|
|
<div className="space-y-2">
|
|
<RadioGroup value={label} onChange={setLabel} options={["good", "bad", "unsure"]} />
|
|
<textarea value={comment} onChange={e => setComment(e.target.value)} />
|
|
<button onClick={submit}>Save</button>
|
|
</div>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 매 Filter / query (Braintrust-style)
|
|
```typescript
|
|
type FilterExpr = {
|
|
field: "score.faithfulness" | "duration_ms" | "model";
|
|
op: "<" | ">" | "==" | "contains";
|
|
value: number | string;
|
|
};
|
|
|
|
function applyFilters(traces: Trace[], filters: FilterExpr[]) {
|
|
return traces.filter(t => filters.every(f => evalExpr(t, f)));
|
|
}
|
|
|
|
// 매 UI: 매 query builder + 매 saved filter
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| 매 hosted eval platform 가능 | 매 Braintrust / Langfuse / Phoenix (build X) |
|
|
| 매 internal-only, 매 specific domain | 매 custom V-component (Tailwind + TanStack) |
|
|
| 매 small team | 매 hosted — 매 build cost 의 prohibitive |
|
|
| 매 1000+ traces / day | 매 virtualization 필수 |
|
|
|
|
**기본값**: 매 startup 은 매 Langfuse self-host, 매 enterprise 는 매 Braintrust / Arize.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[LLM Eval]] · [[Observability]]
|
|
- 변형: [[Trace Viewer]]
|
|
- 응용: [[Braintrust]] · [[Langfuse]]
|
|
- Adjacent: [[OpenTelemetry]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 V-component 의 boilerplate (trace tree, virtualized list) 의 generation — 매 well-typed React + TanStack 패턴.
|
|
**언제 X**: 매 domain-specific scoring logic — 매 hand-author.
|
|
|
|
## ❌ 안티패턴
|
|
- **매 No virtualization**: 매 5000 trace 의 매 single render — 매 browser freeze.
|
|
- **매 Score 의 raw number 만**: 매 sparkline · histogram 부재 — 매 trend 의 invisible.
|
|
- **매 Mixed run units**: 매 different prompt versions 의 매 scores 의 average — 매 misleading.
|
|
- **매 No annotation persistence**: 매 reviewer label 의 매 lost — 매 future training data 의 source 의 X.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Braintrust docs 2025; Langfuse v3 docs; Arize Phoenix 2024).
|
|
- 신뢰도 B (매 design pattern — 매 standardized spec 부재).
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — trace tree, score panel, diff, streaming 패턴 추가 |
|