--- id: wiki-2026-0508-human-in-the-loop-hitl title: Human-in-the-Loop (HITL) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [HITL, human-in-the-loop, active learning, human review, RLHF, AI-assisted] duplicate_of: none source_trust_level: A confidence_score: 0.96 verification_status: applied tags: [ai, hitl, human-in-loop, active-learning, rlhf, oversight, ml-ops] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: Label Studio / Argilla / Prodigy --- # Human-in-the-Loop (HITL) ## 매 한 줄 > **"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve"**. 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm. ## 매 핵심 ### 매 type - **Approval gate**: 매 destructive action 의 의 의 confirm. - **Annotation**: 매 train data label. - **Active learning**: 매 uncertain example 의 의 query. - **Audit / review**: 매 sample-based check. - **Correction**: 매 wrong output fix. - **RLHF**: 매 preference signal. ### 매 응용 1. **Medical AI**: 매 diagnostic suggest → clinician confirm. 2. **Content moderation**: 매 borderline → human reviewer. 3. **LLM agent**: 매 destructive op approval. 4. **Self-driving**: 매 safety driver. 5. **Customer service**: 매 escalation. 6. **Code review**: 매 AI suggest → human merge. ### 매 trade-off - ✅ Safety, accuracy, accountability. - ❌ Latency, cost, operator fatigue. ## 💻 패턴 ### Approval gate (LLM agent) ```python def execute_with_approval(tool, args): if tool.is_destructive: approval = ask_human(f"Approve {tool.name}({args})?") if not approval: return {'status': 'denied'} return tool.run(**args) ``` ### Active learning (uncertainty sampling) ```python def active_learning_loop(unlabeled, model, n_query=100): while unlabeled: # 매 1. predict + uncertainty preds = model.predict_proba(unlabeled) uncertainties = entropy(preds) # 매 2. query top-k uncertain idx = uncertainties.argsort()[-n_query:] to_label = unlabeled[idx] labels = human_label(to_label) # 매 3. retrain model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels)) unlabeled = np.delete(unlabeled, idx, axis=0) ``` ### Confidence-based routing ```python def hitl_route(model_output, threshold=0.85): if model_output.confidence > threshold: return {'auto_action': model_output.prediction} return {'queue_for_human': model_output, 'priority': model_output.confidence} ``` ### Label Studio integration ```python import requests def push_to_labelers(tasks): requests.post('http://label-studio/api/projects/1/import', json=tasks, headers={'Authorization': f'Token {TOKEN}'}) def fetch_completed_labels(): return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json() ``` ### RLHF (preference) ```python def collect_preferences(prompts, model, n=1000): pairs = [] for prompt in prompts[:n]: # 매 generate 2 candidates a = model.generate(prompt, temperature=0.7) b = model.generate(prompt, temperature=0.7) # 매 human picks chosen, rejected = human_compare(prompt, a, b) pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected}) return pairs # 매 → DPO / RLHF training ``` ### Sample-based audit ```python def audit_sample(predictions, sample_rate=0.05): sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate)) audit_results = [] for p in sample: verdict = human_review(p) audit_results.append({'prediction': p, 'human_verdict': verdict}) accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results) return {'sample_size': len(sample), 'accuracy': accuracy} ``` ### Escalation queue (priority) ```python import heapq class EscalationQueue: def __init__(self): self.queue = [] def add(self, item, priority): # 매 lower priority value = higher urgency heapq.heappush(self.queue, (priority, item)) def next_for_review(self): if not self.queue: return None _, item = heapq.heappop(self.queue) return item ``` ### Correction feedback loop ```python def correction_pipeline(model_output, human_correction): if model_output != human_correction: # 매 1. log discrepancy log({'model': model_output, 'human': human_correction, 'context': ...}) # 매 2. add to retraining set retrain_set.append({'input': ..., 'label': human_correction}) # 매 3. trigger retrain when threshold if len(retrain_set) > 1000: schedule_retrain() ``` ### Reviewer fatigue (UX) ```python class ReviewerLoad: def __init__(self, max_per_hour=200, break_every_min=45): self.reviews = [] self.max_per_hour = max_per_hour self.break_every = break_every_min def can_review(self, reviewer_id): recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)] if len(recent) >= self.max_per_hour: return 'rate_limit' last_break = max((r.time for r in recent if r.was_break), default=None) if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due' return 'ok' ``` ### Inter-rater agreement (Cohen κ) ```python from sklearn.metrics import cohen_kappa_score def measure_agreement(rater_a_labels, rater_b_labels): return cohen_kappa_score(rater_a_labels, rater_b_labels) # 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor ``` ### Disagreement resolution ```python def resolve_disagreement(labels): """매 multiple raters → 매 final label.""" if len(set(labels)) == 1: return labels[0] # 매 majority vote from collections import Counter most_common, count = Counter(labels).most_common(1)[0] if count / len(labels) >= 0.6: return most_common # 매 escalate to senior return senior_review(labels) ``` ### Time-bound HITL (LLM agent) ```python async def hitl_with_timeout(action, timeout_sec=30): """매 매 timeout 의 의 의 either approve or default.""" try: approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec) return approval except asyncio.TimeoutError: return {'denied': True, 'reason': 'timeout — human reviewer not available'} ``` ### LLM-judge as cheap proxy ```python def cascading_review(item, llm_judge, human): """매 LLM judge first, 매 unsure → human.""" judge_verdict = llm_judge.review(item) if judge_verdict.confidence > 0.9: return judge_verdict.decision return human.review(item) ``` ### Cost analysis ```python def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01): auto_handled = sum(1 for i in items if i.confidence > auto_threshold) human_reviewed = n_items - auto_handled return n_items * model_cost + human_reviewed * human_cost ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Destructive action | Approval gate | | Borderline classification | Confidence routing | | Low-data | Active learning | | Preference align | RLHF | | Production audit | Sample-based | | LLM agent | HITL + sandbox | | Medical / legal | Always HITL | **기본값**: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample. ## 🔗 Graph - 부모: [[AI-Safety]] · [[ML-Ops]] - 변형: [[Active-Learning]] · [[RLHF]] - 응용: [[Excessive Agency]] · [[Content-Moderation]] · [[Constitutional-AI]] - Adjacent: [[Label-Studio]] · [[Argilla]] · [[Prodigy]] · [[Inter-Rater-Agreement]] ## 🤖 LLM 활용 **언제**: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy. **언제 X**: 매 high-volume low-stakes (full auto). ## ❌ 안티패턴 - **HITL theater**: 매 approval rubber-stamp. - **Reviewer fatigue ignore**: 매 quality drop. - **No κ measurement**: 매 disagreement invisible. - **No SLA on review**: 매 stuck. - **No cost analysis**: 매 unsustainable. ## 🧪 검증 / 중복 - Verified (Active learning literature, RLHF papers, ML-Ops). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |