---
id: wiki-2026-0508-human-in-the-loop-hitl
title: Human-in-the-Loop (HITL)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [HITL, human-in-the-loop, active learning, human review, RLHF, AI-assisted]
duplicate_of: none
source_trust_level: A
confidence_score: 0.96
verification_status: applied
tags: [ai, hitl, human-in-loop, active-learning, rlhf, oversight, ml-ops]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python
  framework: Label Studio / Argilla / Prodigy
---

# Human-in-the-Loop (HITL)

## 매 한 줄
> **"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve"**. 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.

## 매 핵심

### 매 type
- **Approval gate**: 매 destructive action 의 의 의 confirm.
- **Annotation**: 매 train data label.
- **Active learning**: 매 uncertain example 의 의 query.
- **Audit / review**: 매 sample-based check.
- **Correction**: 매 wrong output fix.
- **RLHF**: 매 preference signal.

### 매 응용
1. **Medical AI**: 매 diagnostic suggest → clinician confirm.
2. **Content moderation**: 매 borderline → human reviewer.
3. **LLM agent**: 매 destructive op approval.
4. **Self-driving**: 매 safety driver.
5. **Customer service**: 매 escalation.
6. **Code review**: 매 AI suggest → human merge.

### 매 trade-off
- ✅ Safety, accuracy, accountability.
- ❌ Latency, cost, operator fatigue.

## 💻 패턴

### Approval gate (LLM agent)
```python
def execute_with_approval(tool, args):
    if tool.is_destructive:
        approval = ask_human(f"Approve {tool.name}({args})?")
        if not approval: return {'status': 'denied'}
    return tool.run(**args)
```

### Active learning (uncertainty sampling)
```python
def active_learning_loop(unlabeled, model, n_query=100):
    while unlabeled:
        # 매 1. predict + uncertainty
        preds = model.predict_proba(unlabeled)
        uncertainties = entropy(preds)
        
        # 매 2. query top-k uncertain
        idx = uncertainties.argsort()[-n_query:]
        to_label = unlabeled[idx]
        labels = human_label(to_label)
        
        # 매 3. retrain
        model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
        unlabeled = np.delete(unlabeled, idx, axis=0)
```

### Confidence-based routing
```python
def hitl_route(model_output, threshold=0.85):
    if model_output.confidence > threshold:
        return {'auto_action': model_output.prediction}
    return {'queue_for_human': model_output, 'priority': model_output.confidence}
```

### Label Studio integration
```python
import requests
def push_to_labelers(tasks):
    requests.post('http://label-studio/api/projects/1/import',
                  json=tasks, headers={'Authorization': f'Token {TOKEN}'})

def fetch_completed_labels():
    return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()
```

### RLHF (preference)
```python
def collect_preferences(prompts, model, n=1000):
    pairs = []
    for prompt in prompts[:n]:
        # 매 generate 2 candidates
        a = model.generate(prompt, temperature=0.7)
        b = model.generate(prompt, temperature=0.7)
        # 매 human picks
        chosen, rejected = human_compare(prompt, a, b)
        pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
    return pairs

# 매 → DPO / RLHF training
```

### Sample-based audit
```python
def audit_sample(predictions, sample_rate=0.05):
    sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
    audit_results = []
    for p in sample:
        verdict = human_review(p)
        audit_results.append({'prediction': p, 'human_verdict': verdict})
    
    accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
    return {'sample_size': len(sample), 'accuracy': accuracy}
```

### Escalation queue (priority)
```python
import heapq
class EscalationQueue:
    def __init__(self):
        self.queue = []
    
    def add(self, item, priority):
        # 매 lower priority value = higher urgency
        heapq.heappush(self.queue, (priority, item))
    
    def next_for_review(self):
        if not self.queue: return None
        _, item = heapq.heappop(self.queue)
        return item
```

### Correction feedback loop
```python
def correction_pipeline(model_output, human_correction):
    if model_output != human_correction:
        # 매 1. log discrepancy
        log({'model': model_output, 'human': human_correction, 'context': ...})
        # 매 2. add to retraining set
        retrain_set.append({'input': ..., 'label': human_correction})
        # 매 3. trigger retrain when threshold
        if len(retrain_set) > 1000:
            schedule_retrain()
```

### Reviewer fatigue (UX)
```python
class ReviewerLoad:
    def __init__(self, max_per_hour=200, break_every_min=45):
        self.reviews = []
        self.max_per_hour = max_per_hour
        self.break_every = break_every_min
    
    def can_review(self, reviewer_id):
        recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
        if len(recent) >= self.max_per_hour: return 'rate_limit'
        last_break = max((r.time for r in recent if r.was_break), default=None)
        if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
        return 'ok'
```

### Inter-rater agreement (Cohen κ)
```python
from sklearn.metrics import cohen_kappa_score
def measure_agreement(rater_a_labels, rater_b_labels):
    return cohen_kappa_score(rater_a_labels, rater_b_labels)
# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor
```

### Disagreement resolution
```python
def resolve_disagreement(labels):
    """매 multiple raters → 매 final label."""
    if len(set(labels)) == 1: return labels[0]
    # 매 majority vote
    from collections import Counter
    most_common, count = Counter(labels).most_common(1)[0]
    if count / len(labels) >= 0.6: return most_common
    # 매 escalate to senior
    return senior_review(labels)
```

### Time-bound HITL (LLM agent)
```python
async def hitl_with_timeout(action, timeout_sec=30):
    """매 매 timeout 의 의 의 either approve or default."""
    try:
        approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
        return approval
    except asyncio.TimeoutError:
        return {'denied': True, 'reason': 'timeout — human reviewer not available'}
```

### LLM-judge as cheap proxy
```python
def cascading_review(item, llm_judge, human):
    """매 LLM judge first, 매 unsure → human."""
    judge_verdict = llm_judge.review(item)
    if judge_verdict.confidence > 0.9:
        return judge_verdict.decision
    return human.review(item)
```

### Cost analysis
```python
def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
    auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
    human_reviewed = n_items - auto_handled
    return n_items * model_cost + human_reviewed * human_cost
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Destructive action | Approval gate |
| Borderline classification | Confidence routing |
| Low-data | Active learning |
| Preference align | RLHF |
| Production audit | Sample-based |
| LLM agent | HITL + sandbox |
| Medical / legal | Always HITL |

**기본값**: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.

## 🔗 Graph
- 부모: [[AI-Safety]] · [[ML-Ops]]
- 변형: [[Active-Learning]] · [[RLHF]]
- 응용: [[Excessive Agency]] · [[Content-Moderation]] · [[Constitutional-AI]]
- Adjacent: [[Label-Studio]] · [[Argilla]] · [[Prodigy]] · [[Inter-Rater-Agreement]]

## 🤖 LLM 활용
**언제**: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy.
**언제 X**: 매 high-volume low-stakes (full auto).

## ❌ 안티패턴
- **HITL theater**: 매 approval rubber-stamp.
- **Reviewer fatigue ignore**: 매 quality drop.
- **No κ measurement**: 매 disagreement invisible.
- **No SLA on review**: 매 stuck.
- **No cost analysis**: 매 unsustainable.

## 🧪 검증 / 중복
- Verified (Active learning literature, RLHF papers, ML-Ops).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |