Files
2nd/10_Wiki/Topics/AI_and_ML/Parameter Control.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.3 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-parameter-control Parameter Control 10_Wiki/Topics verified self
Hyperparameter-Control
Sampler-Control
Adaptive-Parameters
none A 0.85 applied
hyperparameter
sampling
control
training
generative-ai
2026-05-10 pending
language framework
python pytorch-vllm

Parameter Control

매 한 줄

"매 fixed config → 매 dynamic schedule". Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.

매 핵심

매 3 도메인

  1. ML training: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
  2. Generative inference: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
  3. Game balance: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.

매 control 의 3 mode (Eiben & Smit)

  • Deterministic: fixed schedule (cosine LR, ε-decay).
  • Adaptive: feedback-based (loss plateau → reduce lr).
  • Self-adaptive: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).

매 응용

  1. LLM serving 의 per-request sampler tuning.
  2. RL 의 entropy coefficient 의 auto-tuning (SAC).
  3. Diffusion 의 CFG scale schedule.

💻 패턴

LR scheduler (PyTorch)

import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR

opt = torch.optim.AdamW(model.parameters(), lr=3e-4)

# Warmup + cosine
def lr_lambda(step):
    warmup = 1000
    if step < warmup:
        return step / warmup
    progress = (step - warmup) / (TOTAL - warmup)
    return 0.5 * (1 + math.cos(math.pi * progress))

sched = LambdaLR(opt, lr_lambda)

vLLM sampler params (per-request)

from vllm import LLM, SamplingParams

llm = LLM("meta-llama/Llama-3.3-70B-Instruct")

creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
                          repetition_penalty=1.1, max_tokens=512)
factual  = SamplingParams(temperature=0.2, top_p=0.9,  top_k=20,
                          max_tokens=256)

llm.generate("Write a poem.",  creative)
llm.generate("Capital of FR?", factual)

Adaptive temperature (entropy targeting)

def adaptive_temp(logits, target_entropy=2.5, iters=10):
    T = 1.0
    for _ in range(iters):
        p = softmax(logits / T)
        H = -(p * log(p)).sum()
        T *= (target_entropy / H) ** 0.5
    return T

SAC entropy coefficient (self-adaptive)

# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim   # heuristic

# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()

Dynamic difficulty (game)

class DDA:
    def __init__(self, target_winrate=0.55, lr=0.05):
        self.diff = 0.5
        self.target = target_winrate
        self.lr = lr
    def update(self, won: bool):
        self.diff += self.lr * (int(won) - self.target)
        self.diff = max(0.0, min(1.0, self.diff))

CFG scale schedule (diffusion)

def cfg_schedule(t, total, base=7.5):
    # higher CFG early, taper late (Karras schedule trick)
    progress = t / total
    return base * (1 - 0.3 * progress)

Plateau-based lr reduction

sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
    opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
    val_loss = evaluate()
    sched.step(val_loss)

매 결정 기준

상황 Approach
매 known schedule deterministic (cosine, exponential)
매 unknown convergence adaptive (ReduceLROnPlateau)
매 RL entropy / GAN balance self-adaptive (learnable param)
매 LLM serving per-request sampler config
매 game DDA with target metric

기본값: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.

🔗 Graph

🤖 LLM 활용

언제: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천. 언제 X: 매 first prototype — fixed default 로 시작, control 은 나중.

안티패턴

  • No warmup: 매 large LR 로 시작 → loss spike.
  • Fixed temp for all tasks: factual 에 0.9, creative 에 0.1 같은 mismatch.
  • DDA without floor/ceiling: difficulty 가 unbounded drift.
  • Self-adaptive without target: learnable param 이 collapse (alpha → 0).

🧪 검증 / 중복

  • Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — unified ML/inference/game parameter control