2nd/10_Wiki/Topics/AI_and_ML/Parameter Control.md

---
id: wiki-2026-0508-parameter-control
title: Parameter Control
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Hyperparameter-Control, Sampler-Control, Adaptive-Parameters]
duplicate_of: none
source_trust_level: A
confidence_score: 0.85
verification_status: applied
tags: [hyperparameter, sampling, control, training, generative-ai]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: pytorch-vllm
---

# Parameter Control

## 매 한 줄
> **"매 fixed config → 매 dynamic schedule"**. Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.

## 매 핵심

### 매 3 도메인
1. **ML training**: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
2. **Generative inference**: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
3. **Game balance**: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.

### 매 control 의 3 mode (Eiben & Smit)
- **Deterministic**: fixed schedule (cosine LR, ε-decay).
- **Adaptive**: feedback-based (loss plateau → reduce lr).
- **Self-adaptive**: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).

### 매 응용
1. LLM serving 의 per-request sampler tuning.
2. RL 의 entropy coefficient 의 auto-tuning (SAC).
3. Diffusion 의 CFG scale schedule.

## 💻 패턴

### LR scheduler (PyTorch)
```python
import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR

opt = torch.optim.AdamW(model.parameters(), lr=3e-4)

# Warmup + cosine
def lr_lambda(step):
    warmup = 1000
    if step < warmup:
        return step / warmup
    progress = (step - warmup) / (TOTAL - warmup)
    return 0.5 * (1 + math.cos(math.pi * progress))

sched = LambdaLR(opt, lr_lambda)
```

### vLLM sampler params (per-request)
```python
from vllm import LLM, SamplingParams

llm = LLM("meta-llama/Llama-3.3-70B-Instruct")

creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
                          repetition_penalty=1.1, max_tokens=512)
factual  = SamplingParams(temperature=0.2, top_p=0.9,  top_k=20,
                          max_tokens=256)

llm.generate("Write a poem.",  creative)
llm.generate("Capital of FR?", factual)
```

### Adaptive temperature (entropy targeting)
```python
def adaptive_temp(logits, target_entropy=2.5, iters=10):
    T = 1.0
    for _ in range(iters):
        p = softmax(logits / T)
        H = -(p * log(p)).sum()
        T *= (target_entropy / H) ** 0.5
    return T
```

### SAC entropy coefficient (self-adaptive)
```python
# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim   # heuristic

# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()
```

### Dynamic difficulty (game)
```python
class DDA:
    def __init__(self, target_winrate=0.55, lr=0.05):
        self.diff = 0.5
        self.target = target_winrate
        self.lr = lr
    def update(self, won: bool):
        self.diff += self.lr * (int(won) - self.target)
        self.diff = max(0.0, min(1.0, self.diff))
```

### CFG scale schedule (diffusion)
```python
def cfg_schedule(t, total, base=7.5):
    # higher CFG early, taper late (Karras schedule trick)
    progress = t / total
    return base * (1 - 0.3 * progress)
```

### Plateau-based lr reduction
```python
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
    opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
    val_loss = evaluate()
    sched.step(val_loss)
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 known schedule | deterministic (cosine, exponential) |
| 매 unknown convergence | adaptive (ReduceLROnPlateau) |
| 매 RL entropy / GAN balance | self-adaptive (learnable param) |
| 매 LLM serving | per-request sampler config |
| 매 game | DDA with target metric |

**기본값**: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.

## 🔗 Graph
- 부모: [[Hyperparameters|Hyperparameter-Tuning]] · [[Optimization]]
- 변형: [[Dynamic-Difficulty-Adjustment]]
- 응용: [[LLM-Sampling]] · [[Reinforcement-Learning]]
- Adjacent: [[CMA-ES]] · [[Bayesian-Optimization]]

## 🤖 LLM 활용
**언제**: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천.
**언제 X**: 매 first prototype — fixed default 로 시작, control 은 나중.

## ❌ 안티패턴
- **No warmup**: 매 large LR 로 시작 → loss spike.
- **Fixed temp for all tasks**: factual 에 0.9, creative 에 0.1 같은 mismatch.
- **DDA without floor/ceiling**: difficulty 가 unbounded drift.
- **Self-adaptive without target**: learnable param 이 collapse (alpha → 0).

## 🧪 검증 / 중복
- Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |