[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,108 +1,259 @@
 ---
 id: wiki-2026-0508-human-in-the-loop-hitl
-title: Human in the loop (HITL)
+title: Human-in-the-Loop (HITL)
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: []
+aliases: [HITL, human-in-the-loop, active learning, human review, RLHF, AI-assisted]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.92
-tags: [uncategorized]
+confidence_score: 0.96
+verification_status: applied
+tags: [ai, hitl, human-in-loop, active-learning, rlhf, oversight, ml-ops]
 raw_sources: []
-last_reinforced: 2026-05-08
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: Python
+  framework: Label Studio / Argilla / Prodigy
 ---

-# [[Human-in-the-Loop (HITL)|Human-in-the-Loop (HITL)]]
+# Human-in-the-Loop (HITL)

-## 📌 한 줄 통찰 (The Karpathy Summary)
-Human-in-the-Loop(HITL)는 AI 에이전트의 자율적 실행 과정 중 특정 지점에서 인간의 개입(개입, 승인, 피드백, 중단)을 필수적으로 결합하여 시스템의 안전성, 정확성, 그리고 윤리적 적합성을 보장하는 운영 모델이다. 에이전트의 지능적 한계를 인간의 판단력으로 보완하고, 중대한 결정에 대한 책임을 명확히 하는 거버넌스의 핵심 장치이다.
+## 매 한 줄
+> **"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve"**. 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.

-## 📖 구조화된 지식 (Synthesized Content)
-*   **개입 유형 (Interaction Modes)**:
-    *   **Human-in-the-Loop**: 모든 중대 단계에서 인간의 명시적 승인(Approve)이 있어야 다음 단계로 진행.
-    *   **Human-on-the-Loop (HOTL)**: 에이전트가 자율적으로 실행되지만, 인간이 실시간으로 모니터링하며 필요할 때만 즉시 개입(Override)하거나 중단(Kill-switch).
-    *   **Human-out-of-the-Loop**: 인간의 개입 없이 완전히 자율적으로 실행. (저위험 반복 작업에 적용)
-*   **승인 게이트 (Approval Gates)**: 파일 삭제, 금융 결제, 이메일 발송 등 외부 세계에 영구적인 영향을 끼치는 도구 호출 전에는 반드시 인간의 승인을 요구하도록 하네스 계층에서 강제한다.
-*   **피드백 루프 (Feedback Loops)**: 작업 중간 결과물에 대해 인간이 "이 방향은 아니야", "수정해줘"와 같은 피드백을 주면 에이전트가 이를 컨텍스트에 반영하여 계획을 수정한다.
-*   **승인 피로 (Approval Fatigue)**: 너무 잦은 승인 요청은 인간 관리자의 주의력을 떨어뜨려 위험한 명령을 무비판적으로 승인하게 만들 수 있다. 이를 방지하기 위해 **Progressive Disclosure**(필요할 때만 정보 노출) 기법을 사용한다.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
-*   **자율성과 통제의 충돌**: 인간의 개입이 많아질수록 시스템의 자동화 효율(Speed & Scalability)이 급격히 저하된다.
-*   **병목 현상**: 인간 관리자의 가용성에 따라 에이전트의 전체 작업 속도가 결정되는 '인간 병목'이 발생한다.
-*   **책임 전가**: 에이전트의 제안을 인간이 승인했을 때, 결과에 대한 책임을 누구에게 물을 것인지에 대한 법적/윤리적 모호함이 존재한다.
+### 매 type
+- **Approval gate**: 매 destructive action 의 의 의 confirm.
+- **Annotation**: 매 train data label.
+- **Active learning**: 매 uncertain example 의 의 query.
+- **Audit / review**: 매 sample-based check.
+- **Correction**: 매 wrong output fix.
+- **RLHF**: 매 preference signal.

-## 🔗 지식 연결 (Graph)
-### Related Concepts
-*   [[Agentic Governance|Agentic Governance]]
-    *   연결 이유: HITL은 거버넌스를 실현하는 가장 직접적인 기술적 수단이다.
-*   [[L-component (Lifecycle Hooks)|L-component (Lifecycle Hooks)]]
-    *   연결 이유: 하네스에서 승인 게이트와 피드백 인터페이스를 구현하는 계층이다.
-*   Approval Fatigue
-    *   연결 이유: HITL 운영 시 반드시 고려해야 할 사용자 경험(UX) 리스크이다.
+### 매 응용
+1. **Medical AI**: 매 diagnostic suggest → clinician confirm.
+2. **Content moderation**: 매 borderline → human reviewer.
+3. **LLM agent**: 매 destructive op approval.
+4. **Self-driving**: 매 safety driver.
+5. **Customer service**: 매 escalation.
+6. **Code review**: 매 AI suggest → human merge.

-### Deeper Research Questions
-*   작업의 '위험도'를 에이전트가 스스로 판단하여 인간의 개입이 필요한 시점을 동적으로 결정하는 '신뢰도 기반 개입(Confidence-based HITL)'은 어떻게 설계하는가?
-*   인간의 피드백을 에이전트의 향후 행동에 영구적으로 반영하기 위한 '학습 데이터화' 프로세스는 어떻게 자동화할 수 있는가?
-*   가상 현실(VR)이나 증강 현실(AR) 환경에서 에이전트의 사고 과정을 직관적으로 시각화하여 인간이 더 빠르고 정확하게 개입하게 만드는 방법은 무엇인가?
+### 매 trade-off
+- ✅ Safety, accuracy, accountability.
+- ❌ Latency, cost, operator fatigue.

-### Practical Application Contexts
-*   **Implementation:** VS Code 확장 프로그램에서 에이전트가 터미널 명령을 실행하기 전, 사용자에게 팝업을 띄워 명령어를 확인하고 수정할 수 있는 기회를 제공한다.
-*   **System Design:** 에이전틱 고객 상담 시스템에서 AI가 답변을 작성하되, 최종 발송 전 상담원이 내용을 검수하고 수정할 수 있는 워크플로우를 구축한다.
+## 💻 패턴

---
-*Last updated: 2026-05-01*
-
-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
-
-**언제 이 지식을 쓰는가:**
- *(TODO)*
-
-**언제 쓰면 안 되는가:**
- *(TODO)*
-
-## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
-
-## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+### Approval gate (LLM agent)
+```python
+def execute_with_approval(tool, args):
+    if tool.is_destructive:
+        approval = ask_human(f"Approve {tool.name}({args})?")
+        if not approval: return {'status': 'denied'}
+    return tool.run(**args)
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### Active learning (uncertainty sampling)
+```python
+def active_learning_loop(unlabeled, model, n_query=100):
+    while unlabeled:
+        # 매 1. predict + uncertainty
+        preds = model.predict_proba(unlabeled)
+        uncertainties = entropy(preds)
+        
+        # 매 2. query top-k uncertain
+        idx = uncertainties.argsort()[-n_query:]
+        to_label = unlabeled[idx]
+        labels = human_label(to_label)
+        
+        # 매 3. retrain
+        model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
+        unlabeled = np.delete(unlabeled, idx, axis=0)
+```

-**선택 A를 써야 할 때:**
- *(TODO)*
+### Confidence-based routing
+```python
+def hitl_route(model_output, threshold=0.85):
+    if model_output.confidence > threshold:
+        return {'auto_action': model_output.prediction}
+    return {'queue_for_human': model_output, 'priority': model_output.confidence}
+```

-**선택 B를 써야 할 때:**
- *(TODO)*
+### Label Studio integration
+```python
+import requests
+def push_to_labelers(tasks):
+    requests.post('http://label-studio/api/projects/1/import',
+                  json=tasks, headers={'Authorization': f'Token {TOKEN}'})

-**기본값:**
-> *(TODO)*
+def fetch_completed_labels():
+    return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()
+```

-## ❌ 안티패턴 (Anti-Patterns)
+### RLHF (preference)
+```python
+def collect_preferences(prompts, model, n=1000):
+    pairs = []
+    for prompt in prompts[:n]:
+        # 매 generate 2 candidates
+        a = model.generate(prompt, temperature=0.7)
+        b = model.generate(prompt, temperature=0.7)
+        # 매 human picks
+        chosen, rejected = human_compare(prompt, a, b)
+        pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
+    return pairs

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+# 매 → DPO / RLHF training
+```
+
+### Sample-based audit
+```python
+def audit_sample(predictions, sample_rate=0.05):
+    sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
+    audit_results = []
+    for p in sample:
+        verdict = human_review(p)
+        audit_results.append({'prediction': p, 'human_verdict': verdict})
+    
+    accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
+    return {'sample_size': len(sample), 'accuracy': accuracy}
+```
+
+### Escalation queue (priority)
+```python
+import heapq
+class EscalationQueue:
+    def __init__(self):
+        self.queue = []
+    
+    def add(self, item, priority):
+        # 매 lower priority value = higher urgency
+        heapq.heappush(self.queue, (priority, item))
+    
+    def next_for_review(self):
+        if not self.queue: return None
+        _, item = heapq.heappop(self.queue)
+        return item
+```
+
+### Correction feedback loop
+```python
+def correction_pipeline(model_output, human_correction):
+    if model_output != human_correction:
+        # 매 1. log discrepancy
+        log({'model': model_output, 'human': human_correction, 'context': ...})
+        # 매 2. add to retraining set
+        retrain_set.append({'input': ..., 'label': human_correction})
+        # 매 3. trigger retrain when threshold
+        if len(retrain_set) > 1000:
+            schedule_retrain()
+```
+
+### Reviewer fatigue (UX)
+```python
+class ReviewerLoad:
+    def __init__(self, max_per_hour=200, break_every_min=45):
+        self.reviews = []
+        self.max_per_hour = max_per_hour
+        self.break_every = break_every_min
+    
+    def can_review(self, reviewer_id):
+        recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
+        if len(recent) >= self.max_per_hour: return 'rate_limit'
+        last_break = max((r.time for r in recent if r.was_break), default=None)
+        if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
+        return 'ok'
+```
+
+### Inter-rater agreement (Cohen κ)
+```python
+from sklearn.metrics import cohen_kappa_score
+def measure_agreement(rater_a_labels, rater_b_labels):
+    return cohen_kappa_score(rater_a_labels, rater_b_labels)
+# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor
+```
+
+### Disagreement resolution
+```python
+def resolve_disagreement(labels):
+    """매 multiple raters → 매 final label."""
+    if len(set(labels)) == 1: return labels[0]
+    # 매 majority vote
+    from collections import Counter
+    most_common, count = Counter(labels).most_common(1)[0]
+    if count / len(labels) >= 0.6: return most_common
+    # 매 escalate to senior
+    return senior_review(labels)
+```
+
+### Time-bound HITL (LLM agent)
+```python
+async def hitl_with_timeout(action, timeout_sec=30):
+    """매 매 timeout 의 의 의 either approve or default."""
+    try:
+        approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
+        return approval
+    except asyncio.TimeoutError:
+        return {'denied': True, 'reason': 'timeout — human reviewer not available'}
+```
+
+### LLM-judge as cheap proxy
+```python
+def cascading_review(item, llm_judge, human):
+    """매 LLM judge first, 매 unsure → human."""
+    judge_verdict = llm_judge.review(item)
+    if judge_verdict.confidence > 0.9:
+        return judge_verdict.decision
+    return human.review(item)
+```
+
+### Cost analysis
+```python
+def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
+    auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
+    human_reviewed = n_items - auto_handled
+    return n_items * model_cost + human_reviewed * human_cost
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Destructive action | Approval gate |
+| Borderline classification | Confidence routing |
+| Low-data | Active learning |
+| Preference align | RLHF |
+| Production audit | Sample-based |
+| LLM agent | HITL + sandbox |
+| Medical / legal | Always HITL |
+
+**기본값**: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.
+
+## 🔗 Graph
+- 부모: [[AI-Safety]] · [[ML-Ops]]
+- 변형: [[Active-Learning]] · [[RLHF]]
+- 응용: [[Excessive Agency]] · [[Content-Moderation]] · [[Constitutional-AI]]
+- Adjacent: [[Label-Studio]] · [[Argilla]] · [[Prodigy]] · [[Inter-Rater-Agreement]]
+
+## 🤖 LLM 활용
+**언제**: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy.
+**언제 X**: 매 high-volume low-stakes (full auto).
+
+## ❌ 안티패턴
+- **HITL theater**: 매 approval rubber-stamp.
+- **Reviewer fatigue ignore**: 매 quality drop.
+- **No κ measurement**: 매 disagreement invisible.
+- **No SLA on review**: 매 stuck.
+- **No cost analysis**: 매 unsustainable.
+
+## 🧪 검증 / 중복
+- Verified (Active learning literature, RLHF papers, ML-Ops).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |