Files
2nd/10_Wiki/Topics/AI_and_ML/Parameter Control.md
T
2026-05-10 22:08:15 +09:00

167 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-parameter-control
title: Parameter Control
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Hyperparameter-Control, Sampler-Control, Adaptive-Parameters]
duplicate_of: none
source_trust_level: A
confidence_score: 0.85
verification_status: applied
tags: [hyperparameter, sampling, control, training, generative-ai]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: pytorch-vllm
---
# Parameter Control
## 매 한 줄
> **"매 fixed config → 매 dynamic schedule"**. Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.
## 매 핵심
### 매 3 도메인
1. **ML training**: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
2. **Generative inference**: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
3. **Game balance**: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.
### 매 control 의 3 mode (Eiben & Smit)
- **Deterministic**: fixed schedule (cosine LR, ε-decay).
- **Adaptive**: feedback-based (loss plateau → reduce lr).
- **Self-adaptive**: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).
### 매 응용
1. LLM serving 의 per-request sampler tuning.
2. RL 의 entropy coefficient 의 auto-tuning (SAC).
3. Diffusion 의 CFG scale schedule.
## 💻 패턴
### LR scheduler (PyTorch)
```python
import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
# Warmup + cosine
def lr_lambda(step):
warmup = 1000
if step < warmup:
return step / warmup
progress = (step - warmup) / (TOTAL - warmup)
return 0.5 * (1 + math.cos(math.pi * progress))
sched = LambdaLR(opt, lr_lambda)
```
### vLLM sampler params (per-request)
```python
from vllm import LLM, SamplingParams
llm = LLM("meta-llama/Llama-3.3-70B-Instruct")
creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
repetition_penalty=1.1, max_tokens=512)
factual = SamplingParams(temperature=0.2, top_p=0.9, top_k=20,
max_tokens=256)
llm.generate("Write a poem.", creative)
llm.generate("Capital of FR?", factual)
```
### Adaptive temperature (entropy targeting)
```python
def adaptive_temp(logits, target_entropy=2.5, iters=10):
T = 1.0
for _ in range(iters):
p = softmax(logits / T)
H = -(p * log(p)).sum()
T *= (target_entropy / H) ** 0.5
return T
```
### SAC entropy coefficient (self-adaptive)
```python
# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim # heuristic
# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()
```
### Dynamic difficulty (game)
```python
class DDA:
def __init__(self, target_winrate=0.55, lr=0.05):
self.diff = 0.5
self.target = target_winrate
self.lr = lr
def update(self, won: bool):
self.diff += self.lr * (int(won) - self.target)
self.diff = max(0.0, min(1.0, self.diff))
```
### CFG scale schedule (diffusion)
```python
def cfg_schedule(t, total, base=7.5):
# higher CFG early, taper late (Karras schedule trick)
progress = t / total
return base * (1 - 0.3 * progress)
```
### Plateau-based lr reduction
```python
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
val_loss = evaluate()
sched.step(val_loss)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 known schedule | deterministic (cosine, exponential) |
| 매 unknown convergence | adaptive (ReduceLROnPlateau) |
| 매 RL entropy / GAN balance | self-adaptive (learnable param) |
| 매 LLM serving | per-request sampler config |
| 매 game | DDA with target metric |
**기본값**: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.
## 🔗 Graph
- 부모: [[Hyperparameter-Tuning]] · [[Optimization]]
- 변형: [[Learning-Rate-Schedule]] · [[Dynamic-Difficulty-Adjustment]]
- 응용: [[LLM-Sampling]] · [[Reinforcement-Learning]] · [[Diffusion-Sampling]]
- Adjacent: [[CMA-ES]] · [[SAC]] · [[Bayesian-Optimization]]
## 🤖 LLM 활용
**언제**: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천.
**언제 X**: 매 first prototype — fixed default 로 시작, control 은 나중.
## ❌ 안티패턴
- **No warmup**: 매 large LR 로 시작 → loss spike.
- **Fixed temp for all tasks**: factual 에 0.9, creative 에 0.1 같은 mismatch.
- **DDA without floor/ceiling**: difficulty 가 unbounded drift.
- **Self-adaptive without target**: learnable param 이 collapse (alpha → 0).
## 🧪 검증 / 중복
- Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |