Files
2nd/10_Wiki/Topics/AI_and_ML/Human-in-the-loop (HITL).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

259 lines
8.4 KiB
Markdown

---
id: wiki-2026-0508-human-in-the-loop-hitl
title: Human-in-the-Loop (HITL)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [HITL, human-in-the-loop, active learning, human review, RLHF, AI-assisted]
duplicate_of: none
source_trust_level: A
confidence_score: 0.96
verification_status: applied
tags: [ai, hitl, human-in-loop, active-learning, rlhf, oversight, ml-ops]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: Label Studio / Argilla / Prodigy
---
# Human-in-the-Loop (HITL)
## 매 한 줄
> **"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve"**. 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.
## 매 핵심
### 매 type
- **Approval gate**: 매 destructive action 의 의 의 confirm.
- **Annotation**: 매 train data label.
- **Active learning**: 매 uncertain example 의 의 query.
- **Audit / review**: 매 sample-based check.
- **Correction**: 매 wrong output fix.
- **RLHF**: 매 preference signal.
### 매 응용
1. **Medical AI**: 매 diagnostic suggest → clinician confirm.
2. **Content moderation**: 매 borderline → human reviewer.
3. **LLM agent**: 매 destructive op approval.
4. **Self-driving**: 매 safety driver.
5. **Customer service**: 매 escalation.
6. **Code review**: 매 AI suggest → human merge.
### 매 trade-off
- ✅ Safety, accuracy, accountability.
- ❌ Latency, cost, operator fatigue.
## 💻 패턴
### Approval gate (LLM agent)
```python
def execute_with_approval(tool, args):
if tool.is_destructive:
approval = ask_human(f"Approve {tool.name}({args})?")
if not approval: return {'status': 'denied'}
return tool.run(**args)
```
### Active learning (uncertainty sampling)
```python
def active_learning_loop(unlabeled, model, n_query=100):
while unlabeled:
# 매 1. predict + uncertainty
preds = model.predict_proba(unlabeled)
uncertainties = entropy(preds)
# 매 2. query top-k uncertain
idx = uncertainties.argsort()[-n_query:]
to_label = unlabeled[idx]
labels = human_label(to_label)
# 매 3. retrain
model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
unlabeled = np.delete(unlabeled, idx, axis=0)
```
### Confidence-based routing
```python
def hitl_route(model_output, threshold=0.85):
if model_output.confidence > threshold:
return {'auto_action': model_output.prediction}
return {'queue_for_human': model_output, 'priority': model_output.confidence}
```
### Label Studio integration
```python
import requests
def push_to_labelers(tasks):
requests.post('http://label-studio/api/projects/1/import',
json=tasks, headers={'Authorization': f'Token {TOKEN}'})
def fetch_completed_labels():
return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()
```
### RLHF (preference)
```python
def collect_preferences(prompts, model, n=1000):
pairs = []
for prompt in prompts[:n]:
# 매 generate 2 candidates
a = model.generate(prompt, temperature=0.7)
b = model.generate(prompt, temperature=0.7)
# 매 human picks
chosen, rejected = human_compare(prompt, a, b)
pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
return pairs
# 매 → DPO / RLHF training
```
### Sample-based audit
```python
def audit_sample(predictions, sample_rate=0.05):
sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
audit_results = []
for p in sample:
verdict = human_review(p)
audit_results.append({'prediction': p, 'human_verdict': verdict})
accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
return {'sample_size': len(sample), 'accuracy': accuracy}
```
### Escalation queue (priority)
```python
import heapq
class EscalationQueue:
def __init__(self):
self.queue = []
def add(self, item, priority):
# 매 lower priority value = higher urgency
heapq.heappush(self.queue, (priority, item))
def next_for_review(self):
if not self.queue: return None
_, item = heapq.heappop(self.queue)
return item
```
### Correction feedback loop
```python
def correction_pipeline(model_output, human_correction):
if model_output != human_correction:
# 매 1. log discrepancy
log({'model': model_output, 'human': human_correction, 'context': ...})
# 매 2. add to retraining set
retrain_set.append({'input': ..., 'label': human_correction})
# 매 3. trigger retrain when threshold
if len(retrain_set) > 1000:
schedule_retrain()
```
### Reviewer fatigue (UX)
```python
class ReviewerLoad:
def __init__(self, max_per_hour=200, break_every_min=45):
self.reviews = []
self.max_per_hour = max_per_hour
self.break_every = break_every_min
def can_review(self, reviewer_id):
recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
if len(recent) >= self.max_per_hour: return 'rate_limit'
last_break = max((r.time for r in recent if r.was_break), default=None)
if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
return 'ok'
```
### Inter-rater agreement (Cohen κ)
```python
from sklearn.metrics import cohen_kappa_score
def measure_agreement(rater_a_labels, rater_b_labels):
return cohen_kappa_score(rater_a_labels, rater_b_labels)
# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor
```
### Disagreement resolution
```python
def resolve_disagreement(labels):
"""매 multiple raters → 매 final label."""
if len(set(labels)) == 1: return labels[0]
# 매 majority vote
from collections import Counter
most_common, count = Counter(labels).most_common(1)[0]
if count / len(labels) >= 0.6: return most_common
# 매 escalate to senior
return senior_review(labels)
```
### Time-bound HITL (LLM agent)
```python
async def hitl_with_timeout(action, timeout_sec=30):
"""매 매 timeout 의 의 의 either approve or default."""
try:
approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
return approval
except asyncio.TimeoutError:
return {'denied': True, 'reason': 'timeout — human reviewer not available'}
```
### LLM-judge as cheap proxy
```python
def cascading_review(item, llm_judge, human):
"""매 LLM judge first, 매 unsure → human."""
judge_verdict = llm_judge.review(item)
if judge_verdict.confidence > 0.9:
return judge_verdict.decision
return human.review(item)
```
### Cost analysis
```python
def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
human_reviewed = n_items - auto_handled
return n_items * model_cost + human_reviewed * human_cost
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Destructive action | Approval gate |
| Borderline classification | Confidence routing |
| Low-data | Active learning |
| Preference align | RLHF |
| Production audit | Sample-based |
| LLM agent | HITL + sandbox |
| Medical / legal | Always HITL |
**기본값**: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.
## 🔗 Graph
- 부모: [[AI-Safety]] · [[ML-Ops]]
- 변형: [[Active-Learning]] · [[RLHF]]
- 응용: [[Excessive Agency]] · [[Content-Moderation]] · [[AI_Safety_and_Alignment|Constitutional-AI]]
## 🤖 LLM 활용
**언제**: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy.
**언제 X**: 매 high-volume low-stakes (full auto).
## ❌ 안티패턴
- **HITL theater**: 매 approval rubber-stamp.
- **Reviewer fatigue ignore**: 매 quality drop.
- **No κ measurement**: 매 disagreement invisible.
- **No SLA on review**: 매 stuck.
- **No cost analysis**: 매 unsustainable.
## 🧪 검증 / 중복
- Verified (Active learning literature, RLHF papers, ML-Ops).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |