Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

4.8 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Kullback-Leibler Divergence

매 한 줄

"매 distribution 간의 directed information loss". KL Divergence D_{\text{KL}}(P \| Q) = \mathbb{E}_P[\log P/Q] 는 reference distribution Q 로 P 를 encode 시 expected extra bits. Kullback & Leibler (1951) 가 정의했고, 2026 ML 에서는 VAE ELBO, RLHF (PPO/DPO), variational inference, distillation 의 매 core loss term.

매 핵심

매 Definition

discrete: D_{\text{KL}}(P\|Q) = \sum_x P(x) \log \frac{P(x)}{Q(x)}
continuous: \int p(x) \log \frac{p(x)}{q(x)} dx
always \ge 0 (Gibbs inequality), =0 iff P=Q
NOT symmetric, NOT a metric (no triangle inequality)
D_{\text{KL}}(P\|Q) = H(P, Q) - H(P) — cross-entropy minus entropy

매 Mode behavior

Forward $D_{\text{KL}}(P|Q)$: Q must cover all mass of P → "mode-covering"
Reverse $D_{\text{KL}}(Q|P)$: Q goes where P has mass → "mode-seeking"
VAE 는 reverse, EP 는 forward

매 응용

VAE ELBO: \mathbb{E}[\log p(x|z)] - D_{\text{KL}}(q(z|x) \| p(z)).
RLHF PPO: \beta \cdot D_{\text{KL}}(\pi \| \pi_{\text{ref}}) penalty.
Knowledge distillation: D_{\text{KL}}(p_T \| p_S) with temperature.
Variational inference: \arg\min_q D_{\text{KL}}(q \| p).
Mutual information: I(X;Y) = D_{\text{KL}}(p(x,y) \| p(x)p(y)).

💻 패턴

Discrete KL

import numpy as np
def kl_div(p, q, eps=1e-12):
    p, q = np.asarray(p), np.asarray(q)
    return np.sum(p * (np.log(p + eps) - np.log(q + eps)))

p = np.array([0.5, 0.3, 0.2])
q = np.array([0.4, 0.4, 0.2])
print(kl_div(p, q))

PyTorch KL (numerically stable)

import torch
import torch.nn.functional as F

# inputs MUST be log-probs for kl_div first arg
log_p = F.log_softmax(model_logits, dim=-1)
q = F.softmax(target_logits, dim=-1)
loss = F.kl_div(log_p, q, reduction="batchmean")

KL between Gaussians (closed form)

def kl_gaussian(mu1, var1, mu2, var2):
    return 0.5 * (
        torch.log(var2 / var1) + (var1 + (mu1-mu2)**2) / var2 - 1
    ).sum()

# VAE: q ~ N(mu, sigma^2), prior N(0, 1)
def kl_to_standard_normal(mu, log_var):
    return -0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())

Distillation loss with temperature

def distill_kl(student_logits, teacher_logits, T=4.0):
    log_p_s = F.log_softmax(student_logits / T, dim=-1)
    p_t = F.softmax(teacher_logits / T, dim=-1)
    return F.kl_div(log_p_s, p_t, reduction="batchmean") * (T*T)

RLHF PPO KL penalty (per-token)

def ppo_kl_penalty(logp_new, logp_ref, beta=0.05):
    # token-level KL via log-prob difference
    return beta * (logp_new - logp_ref)  # used as reward shaping

Forward vs reverse comparison

# Approximate q (Gaussian) to bimodal p
# - reverse KL D(q||p): q picks one mode (mode-seeking)
# - forward KL D(p||q): q spans both modes (mode-covering, broader)

매 결정 기준

Need	Form
variational posterior fit	reverse `D_{\text{KL}}(q\\|p)`
spread (cover all modes)	forward `D_{\text{KL}}(p\\|q)`
symmetric	JS divergence
bounded, metric	Wasserstein, Hellinger
RLHF stability	per-token reverse KL with `\beta` schedule

기본값: 매 problem 따라 — VAE 면 reverse, EP 면 forward.

🔗 Graph

부모: Information_Theory
응용: VAE · RLHF · LLM_Optimization_and_Deployment_Strategies · Variational-Inference
Adjacent: Cross-Entropy · Mutual-Information

🤖 LLM 활용

언제: 매 distribution-level loss 정의, RLHF 의 reference model anchoring, distillation. 언제 X: 매 distance metric 이 필요할 때 — KL 은 metric 이 X — Wasserstein 사용.

❌ 안티패턴

Symmetric 가정: D_{\text{KL}}(P\|Q) \ne D_{\text{KL}}(Q\|P).
Disjoint support: Q(x)=0, P(x)>0 이면 \infty — smooth or use JS.
F.kl_div 의 input 순서 혼동: 첫 arg 는 log-prob.
Distillation T 무시: temperature T 없이 sharp distribution 사용 → poor signal.
RLHF 에서 KL collapse: \beta 너무 작으면 reward hacking.

🧪 검증 / 중복

Verified (Cover & Thomas 2006 textbook ch 2, MacKay 2003 ch 2, Kingma VAE 2013).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — KL definition, mode behavior, VAE/RLHF/distillation patterns

4.8 KiB Raw Blame History