Files
2nd/10_Wiki/Topics/AI_and_ML/Human-in-the-loop (HITL).md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

8.4 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-human-in-the-loop-hitl Human-in-the-Loop (HITL) 10_Wiki/Topics verified self
HITL
human-in-the-loop
active learning
human review
RLHF
AI-assisted
none A 0.96 applied
ai
hitl
human-in-loop
active-learning
rlhf
oversight
ml-ops
2026-05-10 pending
language framework
Python Label Studio / Argilla / Prodigy

Human-in-the-Loop (HITL)

매 한 줄

"매 AI 의 의 의 의 human 의 의 의 critical step 의 verify / correct / approve". 매 active learning, 매 RLHF, 매 content moderation, 매 production safety. 매 modern: 매 LLM agent의 destructive action approval, 매 medical AI 의 의 의 clinician confirm.

매 핵심

매 type

  • Approval gate: 매 destructive action 의 의 의 confirm.
  • Annotation: 매 train data label.
  • Active learning: 매 uncertain example 의 의 query.
  • Audit / review: 매 sample-based check.
  • Correction: 매 wrong output fix.
  • RLHF: 매 preference signal.

매 응용

  1. Medical AI: 매 diagnostic suggest → clinician confirm.
  2. Content moderation: 매 borderline → human reviewer.
  3. LLM agent: 매 destructive op approval.
  4. Self-driving: 매 safety driver.
  5. Customer service: 매 escalation.
  6. Code review: 매 AI suggest → human merge.

매 trade-off

  • Safety, accuracy, accountability.
  • Latency, cost, operator fatigue.

💻 패턴

Approval gate (LLM agent)

def execute_with_approval(tool, args):
    if tool.is_destructive:
        approval = ask_human(f"Approve {tool.name}({args})?")
        if not approval: return {'status': 'denied'}
    return tool.run(**args)

Active learning (uncertainty sampling)

def active_learning_loop(unlabeled, model, n_query=100):
    while unlabeled:
        # 매 1. predict + uncertainty
        preds = model.predict_proba(unlabeled)
        uncertainties = entropy(preds)
        
        # 매 2. query top-k uncertain
        idx = uncertainties.argsort()[-n_query:]
        to_label = unlabeled[idx]
        labels = human_label(to_label)
        
        # 매 3. retrain
        model.fit(np.append(X_train, to_label, axis=0), np.append(y_train, labels))
        unlabeled = np.delete(unlabeled, idx, axis=0)

Confidence-based routing

def hitl_route(model_output, threshold=0.85):
    if model_output.confidence > threshold:
        return {'auto_action': model_output.prediction}
    return {'queue_for_human': model_output, 'priority': model_output.confidence}

Label Studio integration

import requests
def push_to_labelers(tasks):
    requests.post('http://label-studio/api/projects/1/import',
                  json=tasks, headers={'Authorization': f'Token {TOKEN}'})

def fetch_completed_labels():
    return requests.get('http://label-studio/api/projects/1/export?exportType=JSON').json()

RLHF (preference)

def collect_preferences(prompts, model, n=1000):
    pairs = []
    for prompt in prompts[:n]:
        # 매 generate 2 candidates
        a = model.generate(prompt, temperature=0.7)
        b = model.generate(prompt, temperature=0.7)
        # 매 human picks
        chosen, rejected = human_compare(prompt, a, b)
        pairs.append({'prompt': prompt, 'chosen': chosen, 'rejected': rejected})
    return pairs

# 매 → DPO / RLHF training

Sample-based audit

def audit_sample(predictions, sample_rate=0.05):
    sample = np.random.choice(predictions, size=int(len(predictions) * sample_rate))
    audit_results = []
    for p in sample:
        verdict = human_review(p)
        audit_results.append({'prediction': p, 'human_verdict': verdict})
    
    accuracy = sum(1 for r in audit_results if r['prediction'] == r['human_verdict']) / len(audit_results)
    return {'sample_size': len(sample), 'accuracy': accuracy}

Escalation queue (priority)

import heapq
class EscalationQueue:
    def __init__(self):
        self.queue = []
    
    def add(self, item, priority):
        # 매 lower priority value = higher urgency
        heapq.heappush(self.queue, (priority, item))
    
    def next_for_review(self):
        if not self.queue: return None
        _, item = heapq.heappop(self.queue)
        return item

Correction feedback loop

def correction_pipeline(model_output, human_correction):
    if model_output != human_correction:
        # 매 1. log discrepancy
        log({'model': model_output, 'human': human_correction, 'context': ...})
        # 매 2. add to retraining set
        retrain_set.append({'input': ..., 'label': human_correction})
        # 매 3. trigger retrain when threshold
        if len(retrain_set) > 1000:
            schedule_retrain()

Reviewer fatigue (UX)

class ReviewerLoad:
    def __init__(self, max_per_hour=200, break_every_min=45):
        self.reviews = []
        self.max_per_hour = max_per_hour
        self.break_every = break_every_min
    
    def can_review(self, reviewer_id):
        recent = [r for r in self.reviews if r.reviewer == reviewer_id and r.time > now() - timedelta(hours=1)]
        if len(recent) >= self.max_per_hour: return 'rate_limit'
        last_break = max((r.time for r in recent if r.was_break), default=None)
        if last_break and (now() - last_break).seconds > self.break_every * 60: return 'break_due'
        return 'ok'

Inter-rater agreement (Cohen κ)

from sklearn.metrics import cohen_kappa_score
def measure_agreement(rater_a_labels, rater_b_labels):
    return cohen_kappa_score(rater_a_labels, rater_b_labels)
# 매 > 0.8 = excellent, 0.6-0.8 good, < 0.4 poor

Disagreement resolution

def resolve_disagreement(labels):
    """매 multiple raters → 매 final label."""
    if len(set(labels)) == 1: return labels[0]
    # 매 majority vote
    from collections import Counter
    most_common, count = Counter(labels).most_common(1)[0]
    if count / len(labels) >= 0.6: return most_common
    # 매 escalate to senior
    return senior_review(labels)

Time-bound HITL (LLM agent)

async def hitl_with_timeout(action, timeout_sec=30):
    """매 매 timeout 의 의 의 either approve or default."""
    try:
        approval = await asyncio.wait_for(ask_human_approval(action), timeout=timeout_sec)
        return approval
    except asyncio.TimeoutError:
        return {'denied': True, 'reason': 'timeout — human reviewer not available'}

LLM-judge as cheap proxy

def cascading_review(item, llm_judge, human):
    """매 LLM judge first, 매 unsure → human."""
    judge_verdict = llm_judge.review(item)
    if judge_verdict.confidence > 0.9:
        return judge_verdict.decision
    return human.review(item)

Cost analysis

def hitl_cost(n_items, auto_threshold, human_cost=2.0, model_cost=0.01):
    auto_handled = sum(1 for i in items if i.confidence > auto_threshold)
    human_reviewed = n_items - auto_handled
    return n_items * model_cost + human_reviewed * human_cost

매 결정 기준

상황 Approach
Destructive action Approval gate
Borderline classification Confidence routing
Low-data Active learning
Preference align RLHF
Production audit Sample-based
LLM agent HITL + sandbox
Medical / legal Always HITL

기본값: 매 high-stakes / destructive = HITL approval + 매 confidence-routing for scale + 매 active learning for label efficiency + 매 audit sample.

🔗 Graph

🤖 LLM 활용

언제: 매 destructive action. 매 borderline. 매 RLHF. 매 critical accuracy. 언제 X: 매 high-volume low-stakes (full auto).

안티패턴

  • HITL theater: 매 approval rubber-stamp.
  • Reviewer fatigue ignore: 매 quality drop.
  • No κ measurement: 매 disagreement invisible.
  • No SLA on review: 매 stuck.
  • No cost analysis: 매 unsustainable.

🧪 검증 / 중복

  • Verified (Active learning literature, RLHF papers, ML-Ops).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — types + 매 active learning / RLHF / audit / cascade code