--- id: wiki-2026-0508-parameter-control title: Parameter Control category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Hyperparameter-Control, Sampler-Control, Adaptive-Parameters] duplicate_of: none source_trust_level: A confidence_score: 0.85 verification_status: applied tags: [hyperparameter, sampling, control, training, generative-ai] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: pytorch-vllm --- # Parameter Control ## 매 한 줄 > **"매 fixed config → 매 dynamic schedule"**. Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework. ## 매 핵심 ### 매 3 도메인 1. **ML training**: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive. 2. **Generative inference**: temperature, top-p, top-k, repetition penalty 의 per-token 조정. 3. **Game balance**: dynamic difficulty adjustment (DDA), procedural generation 의 parameter. ### 매 control 의 3 mode (Eiben & Smit) - **Deterministic**: fixed schedule (cosine LR, ε-decay). - **Adaptive**: feedback-based (loss plateau → reduce lr). - **Self-adaptive**: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature). ### 매 응용 1. LLM serving 의 per-request sampler tuning. 2. RL 의 entropy coefficient 의 auto-tuning (SAC). 3. Diffusion 의 CFG scale schedule. ## 💻 패턴 ### LR scheduler (PyTorch) ```python import torch from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR opt = torch.optim.AdamW(model.parameters(), lr=3e-4) # Warmup + cosine def lr_lambda(step): warmup = 1000 if step < warmup: return step / warmup progress = (step - warmup) / (TOTAL - warmup) return 0.5 * (1 + math.cos(math.pi * progress)) sched = LambdaLR(opt, lr_lambda) ``` ### vLLM sampler params (per-request) ```python from vllm import LLM, SamplingParams llm = LLM("meta-llama/Llama-3.3-70B-Instruct") creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50, repetition_penalty=1.1, max_tokens=512) factual = SamplingParams(temperature=0.2, top_p=0.9, top_k=20, max_tokens=256) llm.generate("Write a poem.", creative) llm.generate("Capital of FR?", factual) ``` ### Adaptive temperature (entropy targeting) ```python def adaptive_temp(logits, target_entropy=2.5, iters=10): T = 1.0 for _ in range(iters): p = softmax(logits / T) H = -(p * log(p)).sum() T *= (target_entropy / H) ** 0.5 return T ``` ### SAC entropy coefficient (self-adaptive) ```python # learnable log_alpha targets a fixed entropy log_alpha = torch.zeros(1, requires_grad=True, device=dev) alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4) target_H = -action_dim # heuristic # per gradient step alpha = log_alpha.exp() loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean() alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step() ``` ### Dynamic difficulty (game) ```python class DDA: def __init__(self, target_winrate=0.55, lr=0.05): self.diff = 0.5 self.target = target_winrate self.lr = lr def update(self, won: bool): self.diff += self.lr * (int(won) - self.target) self.diff = max(0.0, min(1.0, self.diff)) ``` ### CFG scale schedule (diffusion) ```python def cfg_schedule(t, total, base=7.5): # higher CFG early, taper late (Karras schedule trick) progress = t / total return base * (1 - 0.3 * progress) ``` ### Plateau-based lr reduction ```python sched = torch.optim.lr_scheduler.ReduceLROnPlateau( opt, mode="min", factor=0.5, patience=3, min_lr=1e-6) for epoch in range(EPOCHS): val_loss = evaluate() sched.step(val_loss) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 known schedule | deterministic (cosine, exponential) | | 매 unknown convergence | adaptive (ReduceLROnPlateau) | | 매 RL entropy / GAN balance | self-adaptive (learnable param) | | 매 LLM serving | per-request sampler config | | 매 game | DDA with target metric | **기본값**: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM. ## 🔗 Graph - 부모: [[Hyperparameters|Hyperparameter-Tuning]] · [[Optimization]] - 변형: [[Dynamic-Difficulty-Adjustment]] - 응용: [[LLM-Sampling]] · [[Reinforcement-Learning]] - Adjacent: [[CMA-ES]] · [[Bayesian-Optimization]] ## 🤖 LLM 활용 **언제**: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천. **언제 X**: 매 first prototype — fixed default 로 시작, control 은 나중. ## ❌ 안티패턴 - **No warmup**: 매 large LR 로 시작 → loss spike. - **Fixed temp for all tasks**: factual 에 0.9, creative 에 0.1 같은 mismatch. - **DDA without floor/ceiling**: difficulty 가 unbounded drift. - **Self-adaptive without target**: learnable param 이 collapse (alpha → 0). ## 🧪 검증 / 중복 - Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — unified ML/inference/game parameter control |