f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
167 lines
5.3 KiB
Markdown
167 lines
5.3 KiB
Markdown
---
|
||
id: wiki-2026-0508-parameter-control
|
||
title: Parameter Control
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Hyperparameter-Control, Sampler-Control, Adaptive-Parameters]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.85
|
||
verification_status: applied
|
||
tags: [hyperparameter, sampling, control, training, generative-ai]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: pytorch-vllm
|
||
---
|
||
|
||
# Parameter Control
|
||
|
||
## 매 한 줄
|
||
> **"매 fixed config → 매 dynamic schedule"**. Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 3 도메인
|
||
1. **ML training**: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
|
||
2. **Generative inference**: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
|
||
3. **Game balance**: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.
|
||
|
||
### 매 control 의 3 mode (Eiben & Smit)
|
||
- **Deterministic**: fixed schedule (cosine LR, ε-decay).
|
||
- **Adaptive**: feedback-based (loss plateau → reduce lr).
|
||
- **Self-adaptive**: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).
|
||
|
||
### 매 응용
|
||
1. LLM serving 의 per-request sampler tuning.
|
||
2. RL 의 entropy coefficient 의 auto-tuning (SAC).
|
||
3. Diffusion 의 CFG scale schedule.
|
||
|
||
## 💻 패턴
|
||
|
||
### LR scheduler (PyTorch)
|
||
```python
|
||
import torch
|
||
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR
|
||
|
||
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
|
||
|
||
# Warmup + cosine
|
||
def lr_lambda(step):
|
||
warmup = 1000
|
||
if step < warmup:
|
||
return step / warmup
|
||
progress = (step - warmup) / (TOTAL - warmup)
|
||
return 0.5 * (1 + math.cos(math.pi * progress))
|
||
|
||
sched = LambdaLR(opt, lr_lambda)
|
||
```
|
||
|
||
### vLLM sampler params (per-request)
|
||
```python
|
||
from vllm import LLM, SamplingParams
|
||
|
||
llm = LLM("meta-llama/Llama-3.3-70B-Instruct")
|
||
|
||
creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
|
||
repetition_penalty=1.1, max_tokens=512)
|
||
factual = SamplingParams(temperature=0.2, top_p=0.9, top_k=20,
|
||
max_tokens=256)
|
||
|
||
llm.generate("Write a poem.", creative)
|
||
llm.generate("Capital of FR?", factual)
|
||
```
|
||
|
||
### Adaptive temperature (entropy targeting)
|
||
```python
|
||
def adaptive_temp(logits, target_entropy=2.5, iters=10):
|
||
T = 1.0
|
||
for _ in range(iters):
|
||
p = softmax(logits / T)
|
||
H = -(p * log(p)).sum()
|
||
T *= (target_entropy / H) ** 0.5
|
||
return T
|
||
```
|
||
|
||
### SAC entropy coefficient (self-adaptive)
|
||
```python
|
||
# learnable log_alpha targets a fixed entropy
|
||
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
|
||
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
|
||
target_H = -action_dim # heuristic
|
||
|
||
# per gradient step
|
||
alpha = log_alpha.exp()
|
||
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
|
||
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()
|
||
```
|
||
|
||
### Dynamic difficulty (game)
|
||
```python
|
||
class DDA:
|
||
def __init__(self, target_winrate=0.55, lr=0.05):
|
||
self.diff = 0.5
|
||
self.target = target_winrate
|
||
self.lr = lr
|
||
def update(self, won: bool):
|
||
self.diff += self.lr * (int(won) - self.target)
|
||
self.diff = max(0.0, min(1.0, self.diff))
|
||
```
|
||
|
||
### CFG scale schedule (diffusion)
|
||
```python
|
||
def cfg_schedule(t, total, base=7.5):
|
||
# higher CFG early, taper late (Karras schedule trick)
|
||
progress = t / total
|
||
return base * (1 - 0.3 * progress)
|
||
```
|
||
|
||
### Plateau-based lr reduction
|
||
```python
|
||
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
|
||
opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
|
||
for epoch in range(EPOCHS):
|
||
val_loss = evaluate()
|
||
sched.step(val_loss)
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| 매 known schedule | deterministic (cosine, exponential) |
|
||
| 매 unknown convergence | adaptive (ReduceLROnPlateau) |
|
||
| 매 RL entropy / GAN balance | self-adaptive (learnable param) |
|
||
| 매 LLM serving | per-request sampler config |
|
||
| 매 game | DDA with target metric |
|
||
|
||
**기본값**: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Hyperparameters|Hyperparameter-Tuning]] · [[Optimization]]
|
||
- 변형: [[Dynamic-Difficulty-Adjustment]]
|
||
- 응용: [[LLM-Sampling]] · [[Reinforcement-Learning]]
|
||
- Adjacent: [[CMA-ES]] · [[Bayesian-Optimization]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천.
|
||
**언제 X**: 매 first prototype — fixed default 로 시작, control 은 나중.
|
||
|
||
## ❌ 안티패턴
|
||
- **No warmup**: 매 large LR 로 시작 → loss spike.
|
||
- **Fixed temp for all tasks**: factual 에 0.9, creative 에 0.1 같은 mismatch.
|
||
- **DDA without floor/ceiling**: difficulty 가 unbounded drift.
|
||
- **Self-adaptive without target**: learnable param 이 collapse (alpha → 0).
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |
|