f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.3 KiB
5.3 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-parameter-control | Parameter Control | 10_Wiki/Topics | verified | self |
|
none | A | 0.85 | applied |
|
2026-05-10 | pending |
|
Parameter Control
매 한 줄
"매 fixed config → 매 dynamic schedule". Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.
매 핵심
매 3 도메인
- ML training: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
- Generative inference: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
- Game balance: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.
매 control 의 3 mode (Eiben & Smit)
- Deterministic: fixed schedule (cosine LR, ε-decay).
- Adaptive: feedback-based (loss plateau → reduce lr).
- Self-adaptive: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).
매 응용
- LLM serving 의 per-request sampler tuning.
- RL 의 entropy coefficient 의 auto-tuning (SAC).
- Diffusion 의 CFG scale schedule.
💻 패턴
LR scheduler (PyTorch)
import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
# Warmup + cosine
def lr_lambda(step):
warmup = 1000
if step < warmup:
return step / warmup
progress = (step - warmup) / (TOTAL - warmup)
return 0.5 * (1 + math.cos(math.pi * progress))
sched = LambdaLR(opt, lr_lambda)
vLLM sampler params (per-request)
from vllm import LLM, SamplingParams
llm = LLM("meta-llama/Llama-3.3-70B-Instruct")
creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
repetition_penalty=1.1, max_tokens=512)
factual = SamplingParams(temperature=0.2, top_p=0.9, top_k=20,
max_tokens=256)
llm.generate("Write a poem.", creative)
llm.generate("Capital of FR?", factual)
Adaptive temperature (entropy targeting)
def adaptive_temp(logits, target_entropy=2.5, iters=10):
T = 1.0
for _ in range(iters):
p = softmax(logits / T)
H = -(p * log(p)).sum()
T *= (target_entropy / H) ** 0.5
return T
SAC entropy coefficient (self-adaptive)
# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim # heuristic
# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()
Dynamic difficulty (game)
class DDA:
def __init__(self, target_winrate=0.55, lr=0.05):
self.diff = 0.5
self.target = target_winrate
self.lr = lr
def update(self, won: bool):
self.diff += self.lr * (int(won) - self.target)
self.diff = max(0.0, min(1.0, self.diff))
CFG scale schedule (diffusion)
def cfg_schedule(t, total, base=7.5):
# higher CFG early, taper late (Karras schedule trick)
progress = t / total
return base * (1 - 0.3 * progress)
Plateau-based lr reduction
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
val_loss = evaluate()
sched.step(val_loss)
매 결정 기준
| 상황 | Approach |
|---|---|
| 매 known schedule | deterministic (cosine, exponential) |
| 매 unknown convergence | adaptive (ReduceLROnPlateau) |
| 매 RL entropy / GAN balance | self-adaptive (learnable param) |
| 매 LLM serving | per-request sampler config |
| 매 game | DDA with target metric |
기본값: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.
🔗 Graph
- 부모: Hyperparameters · Optimization
- 변형: Dynamic-Difficulty-Adjustment
- 응용: LLM-Sampling · Reinforcement-Learning
- Adjacent: CMA-ES · Bayesian-Optimization
🤖 LLM 활용
언제: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천. 언제 X: 매 first prototype — fixed default 로 시작, control 은 나중.
❌ 안티패턴
- No warmup: 매 large LR 로 시작 → loss spike.
- Fixed temp for all tasks: factual 에 0.9, creative 에 0.1 같은 mismatch.
- DDA without floor/ceiling: difficulty 가 unbounded drift.
- Self-adaptive without target: learnable param 이 collapse (alpha → 0).
🧪 검증 / 중복
- Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |