Files
2nd/10_Wiki/Topics/AI_and_ML/Parameter Control.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

167 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-parameter-control
title: Parameter Control
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Hyperparameter-Control, Sampler-Control, Adaptive-Parameters]
duplicate_of: none
source_trust_level: A
confidence_score: 0.85
verification_status: applied
tags: [hyperparameter, sampling, control, training, generative-ai]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: pytorch-vllm
---
# Parameter Control
## 매 한 줄
> **"매 fixed config → 매 dynamic schedule"**. Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.
## 매 핵심
### 매 3 도메인
1. **ML training**: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
2. **Generative inference**: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
3. **Game balance**: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.
### 매 control 의 3 mode (Eiben & Smit)
- **Deterministic**: fixed schedule (cosine LR, ε-decay).
- **Adaptive**: feedback-based (loss plateau → reduce lr).
- **Self-adaptive**: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).
### 매 응용
1. LLM serving 의 per-request sampler tuning.
2. RL 의 entropy coefficient 의 auto-tuning (SAC).
3. Diffusion 의 CFG scale schedule.
## 💻 패턴
### LR scheduler (PyTorch)
```python
import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
# Warmup + cosine
def lr_lambda(step):
warmup = 1000
if step < warmup:
return step / warmup
progress = (step - warmup) / (TOTAL - warmup)
return 0.5 * (1 + math.cos(math.pi * progress))
sched = LambdaLR(opt, lr_lambda)
```
### vLLM sampler params (per-request)
```python
from vllm import LLM, SamplingParams
llm = LLM("meta-llama/Llama-3.3-70B-Instruct")
creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
repetition_penalty=1.1, max_tokens=512)
factual = SamplingParams(temperature=0.2, top_p=0.9, top_k=20,
max_tokens=256)
llm.generate("Write a poem.", creative)
llm.generate("Capital of FR?", factual)
```
### Adaptive temperature (entropy targeting)
```python
def adaptive_temp(logits, target_entropy=2.5, iters=10):
T = 1.0
for _ in range(iters):
p = softmax(logits / T)
H = -(p * log(p)).sum()
T *= (target_entropy / H) ** 0.5
return T
```
### SAC entropy coefficient (self-adaptive)
```python
# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim # heuristic
# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()
```
### Dynamic difficulty (game)
```python
class DDA:
def __init__(self, target_winrate=0.55, lr=0.05):
self.diff = 0.5
self.target = target_winrate
self.lr = lr
def update(self, won: bool):
self.diff += self.lr * (int(won) - self.target)
self.diff = max(0.0, min(1.0, self.diff))
```
### CFG scale schedule (diffusion)
```python
def cfg_schedule(t, total, base=7.5):
# higher CFG early, taper late (Karras schedule trick)
progress = t / total
return base * (1 - 0.3 * progress)
```
### Plateau-based lr reduction
```python
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
val_loss = evaluate()
sched.step(val_loss)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 known schedule | deterministic (cosine, exponential) |
| 매 unknown convergence | adaptive (ReduceLROnPlateau) |
| 매 RL entropy / GAN balance | self-adaptive (learnable param) |
| 매 LLM serving | per-request sampler config |
| 매 game | DDA with target metric |
**기본값**: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.
## 🔗 Graph
- 부모: [[Hyperparameters|Hyperparameter-Tuning]] · [[Optimization]]
- 변형: [[Dynamic-Difficulty-Adjustment]]
- 응용: [[LLM-Sampling]] · [[Reinforcement-Learning]]
- Adjacent: [[CMA-ES]] · [[Bayesian-Optimization]]
## 🤖 LLM 활용
**언제**: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천.
**언제 X**: 매 first prototype — fixed default 로 시작, control 은 나중.
## ❌ 안티패턴
- **No warmup**: 매 large LR 로 시작 → loss spike.
- **Fixed temp for all tasks**: factual 에 0.9, creative 에 0.1 같은 mismatch.
- **DDA without floor/ceiling**: difficulty 가 unbounded drift.
- **Self-adaptive without target**: learnable param 이 collapse (alpha → 0).
## 🧪 검증 / 중복
- Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |