d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8.4 KiB
8.4 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-human-in-the-loop-hitl | Human-in-the-Loop (HITL) | 10_Wiki/Topics | verified | self |
|
none | A | 0.96 | applied |
|
2026-05-10 | pending |
|
Human-in-the-Loop (HITL)
매 한 줄
"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve". 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.
매 핵심
매 type
- Approval gate: 매 destructive action 의 의 의 confirm.
- Annotation: 매 train data label.
- Active learning: 매 uncertain example 의 의 query.
- Audit / review: 매 sample-based check.
- Correction: 매 wrong output fix.
- RLHF: 매 preference signal.
매 응용
- Medical AI: 매 diagnostic suggest → clinician confirm.
- Content moderation: 매 borderline → human reviewer.
- LLM agent: 매 destructive op approval.
- Self-driving: 매 safety driver.
- Customer service: 매 escalation.
- Code review: 매 AI suggest → human merge.
매 trade-off
- ✅ Safety, accuracy, accountability.
- ❌ Latency, cost, operator fatigue.
💻 패턴
Approval gate (LLM agent)
def execute_with_approval(tool, args):
if tool.is_destructive:
approval = ask_human(f"Approve {tool.name}({args})?")
if not approval: return {'status': 'denied'}
return tool.run(**args)
Active learning (uncertainty sampling)
def active_learning_loop(unlabeled, model, n_query=100):
while unlabeled:
# 매 1. predict + uncertainty
preds = model.predict_proba(unlabeled)
uncertainties = entropy(preds)
# 매 2. query top-k uncertain
idx = uncertainties.argsort()[-n_query:]
to_label = unlabeled[idx]
labels = human_label(to_label)
# 매 3. retrain
model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
unlabeled = np.delete(unlabeled, idx, axis=0)
Confidence-based routing
def hitl_route(model_output, threshold=0.85):
if model_output.confidence > threshold:
return {'auto_action': model_output.prediction}
return {'queue_for_human': model_output, 'priority': model_output.confidence}
Label Studio integration
import requests
def push_to_labelers(tasks):
requests.post('http://label-studio/api/projects/1/import',
json=tasks, headers={'Authorization': f'Token {TOKEN}'})
def fetch_completed_labels():
return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()
RLHF (preference)
def collect_preferences(prompts, model, n=1000):
pairs = []
for prompt in prompts[:n]:
# 매 generate 2 candidates
a = model.generate(prompt, temperature=0.7)
b = model.generate(prompt, temperature=0.7)
# 매 human picks
chosen, rejected = human_compare(prompt, a, b)
pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
return pairs
# 매 → DPO / RLHF training
Sample-based audit
def audit_sample(predictions, sample_rate=0.05):
sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
audit_results = []
for p in sample:
verdict = human_review(p)
audit_results.append({'prediction': p, 'human_verdict': verdict})
accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
return {'sample_size': len(sample), 'accuracy': accuracy}
Escalation queue (priority)
import heapq
class EscalationQueue:
def __init__(self):
self.queue = []
def add(self, item, priority):
# 매 lower priority value = higher urgency
heapq.heappush(self.queue, (priority, item))
def next_for_review(self):
if not self.queue: return None
_, item = heapq.heappop(self.queue)
return item
Correction feedback loop
def correction_pipeline(model_output, human_correction):
if model_output != human_correction:
# 매 1. log discrepancy
log({'model': model_output, 'human': human_correction, 'context': ...})
# 매 2. add to retraining set
retrain_set.append({'input': ..., 'label': human_correction})
# 매 3. trigger retrain when threshold
if len(retrain_set) > 1000:
schedule_retrain()
Reviewer fatigue (UX)
class ReviewerLoad:
def __init__(self, max_per_hour=200, break_every_min=45):
self.reviews = []
self.max_per_hour = max_per_hour
self.break_every = break_every_min
def can_review(self, reviewer_id):
recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
if len(recent) >= self.max_per_hour: return 'rate_limit'
last_break = max((r.time for r in recent if r.was_break), default=None)
if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
return 'ok'
Inter-rater agreement (Cohen κ)
from sklearn.metrics import cohen_kappa_score
def measure_agreement(rater_a_labels, rater_b_labels):
return cohen_kappa_score(rater_a_labels, rater_b_labels)
# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor
Disagreement resolution
def resolve_disagreement(labels):
"""매 multiple raters → 매 final label."""
if len(set(labels)) == 1: return labels[0]
# 매 majority vote
from collections import Counter
most_common, count = Counter(labels).most_common(1)[0]
if count / len(labels) >= 0.6: return most_common
# 매 escalate to senior
return senior_review(labels)
Time-bound HITL (LLM agent)
async def hitl_with_timeout(action, timeout_sec=30):
"""매 매 timeout 의 의 의 either approve or default."""
try:
approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
return approval
except asyncio.TimeoutError:
return {'denied': True, 'reason': 'timeout — human reviewer not available'}
LLM-judge as cheap proxy
def cascading_review(item, llm_judge, human):
"""매 LLM judge first, 매 unsure → human."""
judge_verdict = llm_judge.review(item)
if judge_verdict.confidence > 0.9:
return judge_verdict.decision
return human.review(item)
Cost analysis
def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
human_reviewed = n_items - auto_handled
return n_items * model_cost + human_reviewed * human_cost
매 결정 기준
| 상황 | Approach |
|---|---|
| Destructive action | Approval gate |
| Borderline classification | Confidence routing |
| Low-data | Active learning |
| Preference align | RLHF |
| Production audit | Sample-based |
| LLM agent | HITL + sandbox |
| Medical / legal | Always HITL |
기본값: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.
🔗 Graph
- 부모: AI Safety · ML-Ops
- 변형: Active Learning · RLHF
- 응용: Excessive Agency · Content-Moderation · AI_Safety_and_Alignment
🤖 LLM 활용
언제: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy. 언제 X: 매 high-volume low-stakes (full auto).
❌ 안티패턴
- HITL theater: 매 approval rubber-stamp.
- Reviewer fatigue ignore: 매 quality drop.
- No κ measurement: 매 disagreement invisible.
- No SLA on review: 매 stuck.
- No cost analysis: 매 unsustainable.
🧪 검증 / 중복
- Verified (Active learning literature, RLHF papers, ML-Ops).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — types + 매 active learning / RLHF / audit / cascade code |