f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
259 lines
8.4 KiB
Markdown
259 lines
8.4 KiB
Markdown
---
|
|
id: wiki-2026-0508-human-in-the-loop-hitl
|
|
title: Human-in-the-Loop (HITL)
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [HITL, human-in-the-loop, active learning, human review, RLHF, AI-assisted]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.96
|
|
verification_status: applied
|
|
tags: [ai, hitl, human-in-loop, active-learning, rlhf, oversight, ml-ops]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Python
|
|
framework: Label Studio / Argilla / Prodigy
|
|
---
|
|
|
|
# Human-in-the-Loop (HITL)
|
|
|
|
## 매 한 줄
|
|
> **"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve"**. 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 type
|
|
- **Approval gate**: 매 destructive action 의 의 의 confirm.
|
|
- **Annotation**: 매 train data label.
|
|
- **Active learning**: 매 uncertain example 의 의 query.
|
|
- **Audit / review**: 매 sample-based check.
|
|
- **Correction**: 매 wrong output fix.
|
|
- **RLHF**: 매 preference signal.
|
|
|
|
### 매 응용
|
|
1. **Medical AI**: 매 diagnostic suggest → clinician confirm.
|
|
2. **Content moderation**: 매 borderline → human reviewer.
|
|
3. **LLM agent**: 매 destructive op approval.
|
|
4. **Self-driving**: 매 safety driver.
|
|
5. **Customer service**: 매 escalation.
|
|
6. **Code review**: 매 AI suggest → human merge.
|
|
|
|
### 매 trade-off
|
|
- ✅ Safety, accuracy, accountability.
|
|
- ❌ Latency, cost, operator fatigue.
|
|
|
|
## 💻 패턴
|
|
|
|
### Approval gate (LLM agent)
|
|
```python
|
|
def execute_with_approval(tool, args):
|
|
if tool.is_destructive:
|
|
approval = ask_human(f"Approve {tool.name}({args})?")
|
|
if not approval: return {'status': 'denied'}
|
|
return tool.run(**args)
|
|
```
|
|
|
|
### Active learning (uncertainty sampling)
|
|
```python
|
|
def active_learning_loop(unlabeled, model, n_query=100):
|
|
while unlabeled:
|
|
# 매 1. predict + uncertainty
|
|
preds = model.predict_proba(unlabeled)
|
|
uncertainties = entropy(preds)
|
|
|
|
# 매 2. query top-k uncertain
|
|
idx = uncertainties.argsort()[-n_query:]
|
|
to_label = unlabeled[idx]
|
|
labels = human_label(to_label)
|
|
|
|
# 매 3. retrain
|
|
model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
|
|
unlabeled = np.delete(unlabeled, idx, axis=0)
|
|
```
|
|
|
|
### Confidence-based routing
|
|
```python
|
|
def hitl_route(model_output, threshold=0.85):
|
|
if model_output.confidence > threshold:
|
|
return {'auto_action': model_output.prediction}
|
|
return {'queue_for_human': model_output, 'priority': model_output.confidence}
|
|
```
|
|
|
|
### Label Studio integration
|
|
```python
|
|
import requests
|
|
def push_to_labelers(tasks):
|
|
requests.post('http://label-studio/api/projects/1/import',
|
|
json=tasks, headers={'Authorization': f'Token {TOKEN}'})
|
|
|
|
def fetch_completed_labels():
|
|
return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()
|
|
```
|
|
|
|
### RLHF (preference)
|
|
```python
|
|
def collect_preferences(prompts, model, n=1000):
|
|
pairs = []
|
|
for prompt in prompts[:n]:
|
|
# 매 generate 2 candidates
|
|
a = model.generate(prompt, temperature=0.7)
|
|
b = model.generate(prompt, temperature=0.7)
|
|
# 매 human picks
|
|
chosen, rejected = human_compare(prompt, a, b)
|
|
pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
|
|
return pairs
|
|
|
|
# 매 → DPO / RLHF training
|
|
```
|
|
|
|
### Sample-based audit
|
|
```python
|
|
def audit_sample(predictions, sample_rate=0.05):
|
|
sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
|
|
audit_results = []
|
|
for p in sample:
|
|
verdict = human_review(p)
|
|
audit_results.append({'prediction': p, 'human_verdict': verdict})
|
|
|
|
accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
|
|
return {'sample_size': len(sample), 'accuracy': accuracy}
|
|
```
|
|
|
|
### Escalation queue (priority)
|
|
```python
|
|
import heapq
|
|
class EscalationQueue:
|
|
def __init__(self):
|
|
self.queue = []
|
|
|
|
def add(self, item, priority):
|
|
# 매 lower priority value = higher urgency
|
|
heapq.heappush(self.queue, (priority, item))
|
|
|
|
def next_for_review(self):
|
|
if not self.queue: return None
|
|
_, item = heapq.heappop(self.queue)
|
|
return item
|
|
```
|
|
|
|
### Correction feedback loop
|
|
```python
|
|
def correction_pipeline(model_output, human_correction):
|
|
if model_output != human_correction:
|
|
# 매 1. log discrepancy
|
|
log({'model': model_output, 'human': human_correction, 'context': ...})
|
|
# 매 2. add to retraining set
|
|
retrain_set.append({'input': ..., 'label': human_correction})
|
|
# 매 3. trigger retrain when threshold
|
|
if len(retrain_set) > 1000:
|
|
schedule_retrain()
|
|
```
|
|
|
|
### Reviewer fatigue (UX)
|
|
```python
|
|
class ReviewerLoad:
|
|
def __init__(self, max_per_hour=200, break_every_min=45):
|
|
self.reviews = []
|
|
self.max_per_hour = max_per_hour
|
|
self.break_every = break_every_min
|
|
|
|
def can_review(self, reviewer_id):
|
|
recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
|
|
if len(recent) >= self.max_per_hour: return 'rate_limit'
|
|
last_break = max((r.time for r in recent if r.was_break), default=None)
|
|
if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
|
|
return 'ok'
|
|
```
|
|
|
|
### Inter-rater agreement (Cohen κ)
|
|
```python
|
|
from sklearn.metrics import cohen_kappa_score
|
|
def measure_agreement(rater_a_labels, rater_b_labels):
|
|
return cohen_kappa_score(rater_a_labels, rater_b_labels)
|
|
# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor
|
|
```
|
|
|
|
### Disagreement resolution
|
|
```python
|
|
def resolve_disagreement(labels):
|
|
"""매 multiple raters → 매 final label."""
|
|
if len(set(labels)) == 1: return labels[0]
|
|
# 매 majority vote
|
|
from collections import Counter
|
|
most_common, count = Counter(labels).most_common(1)[0]
|
|
if count / len(labels) >= 0.6: return most_common
|
|
# 매 escalate to senior
|
|
return senior_review(labels)
|
|
```
|
|
|
|
### Time-bound HITL (LLM agent)
|
|
```python
|
|
async def hitl_with_timeout(action, timeout_sec=30):
|
|
"""매 매 timeout 의 의 의 either approve or default."""
|
|
try:
|
|
approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
|
|
return approval
|
|
except asyncio.TimeoutError:
|
|
return {'denied': True, 'reason': 'timeout — human reviewer not available'}
|
|
```
|
|
|
|
### LLM-judge as cheap proxy
|
|
```python
|
|
def cascading_review(item, llm_judge, human):
|
|
"""매 LLM judge first, 매 unsure → human."""
|
|
judge_verdict = llm_judge.review(item)
|
|
if judge_verdict.confidence > 0.9:
|
|
return judge_verdict.decision
|
|
return human.review(item)
|
|
```
|
|
|
|
### Cost analysis
|
|
```python
|
|
def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
|
|
auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
|
|
human_reviewed = n_items - auto_handled
|
|
return n_items * model_cost + human_reviewed * human_cost
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Destructive action | Approval gate |
|
|
| Borderline classification | Confidence routing |
|
|
| Low-data | Active learning |
|
|
| Preference align | RLHF |
|
|
| Production audit | Sample-based |
|
|
| LLM agent | HITL + sandbox |
|
|
| Medical / legal | Always HITL |
|
|
|
|
**기본값**: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[AI-Safety]] · [[ML-Ops]]
|
|
- 변형: [[Active-Learning]] · [[RLHF]]
|
|
- 응용: [[Excessive Agency]] · [[Content-Moderation]] · [[AI_Safety_and_Alignment|Constitutional-AI]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy.
|
|
**언제 X**: 매 high-volume low-stakes (full auto).
|
|
|
|
## ❌ 안티패턴
|
|
- **HITL theater**: 매 approval rubber-stamp.
|
|
- **Reviewer fatigue ignore**: 매 quality drop.
|
|
- **No κ measurement**: 매 disagreement invisible.
|
|
- **No SLA on review**: 매 stuck.
|
|
- **No cost analysis**: 매 unsustainable.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Active learning literature, RLHF papers, ML-Ops).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |
|