Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

5.3 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Parameter Control

매 한 줄

"매 fixed config → 매 dynamic schedule". Parameter control 은 training / inference / game system 의 knob (learning rate, temperature, difficulty) 을 매 시간/state 에 따라 adaptively 변경하는 전략. 매 generative AI 의 sampler param (temp/top-p/top-k), RL 의 exploration ε, game 의 dynamic balancing 모두 같은 framework.

매 핵심

매 3 도메인

ML training: lr schedule (cosine, warmup), weight decay, dropout 의 adaptive.
Generative inference: temperature, top-p, top-k, repetition penalty 의 per-token 조정.
Game balance: dynamic difficulty adjustment (DDA), procedural generation 의 parameter.

매 control 의 3 mode (Eiben & Smit)

Deterministic: fixed schedule (cosine LR, ε-decay).
Adaptive: feedback-based (loss plateau → reduce lr).
Self-adaptive: parameter 를 의 일부 model state 로 학습 (CMA-ES σ, learnable temperature).

매 응용

LLM serving 의 per-request sampler tuning.
RL 의 entropy coefficient 의 auto-tuning (SAC).
Diffusion 의 CFG scale schedule.

💻 패턴

LR scheduler (PyTorch)

import torch
from torch.optim.lr_scheduler import CosineAnnealingLR, LambdaLR

opt = torch.optim.AdamW(model.parameters(), lr=3e-4)

# Warmup + cosine
def lr_lambda(step):
    warmup = 1000
    if step < warmup:
        return step / warmup
    progress = (step - warmup) / (TOTAL - warmup)
    return 0.5 * (1 + math.cos(math.pi * progress))

sched = LambdaLR(opt, lr_lambda)

vLLM sampler params (per-request)

from vllm import LLM, SamplingParams

llm = LLM("meta-llama/Llama-3.3-70B-Instruct")

creative = SamplingParams(temperature=0.9, top_p=0.95, top_k=50,
                          repetition_penalty=1.1, max_tokens=512)
factual  = SamplingParams(temperature=0.2, top_p=0.9,  top_k=20,
                          max_tokens=256)

llm.generate("Write a poem.",  creative)
llm.generate("Capital of FR?", factual)

Adaptive temperature (entropy targeting)

def adaptive_temp(logits, target_entropy=2.5, iters=10):
    T = 1.0
    for _ in range(iters):
        p = softmax(logits / T)
        H = -(p * log(p)).sum()
        T *= (target_entropy / H) ** 0.5
    return T

SAC entropy coefficient (self-adaptive)

# learnable log_alpha targets a fixed entropy
log_alpha = torch.zeros(1, requires_grad=True, device=dev)
alpha_opt = torch.optim.Adam([log_alpha], lr=3e-4)
target_H = -action_dim   # heuristic

# per gradient step
alpha = log_alpha.exp()
loss_alpha = -(log_alpha * (logp.detach() + target_H)).mean()
alpha_opt.zero_grad(); loss_alpha.backward(); alpha_opt.step()

Dynamic difficulty (game)

class DDA:
    def __init__(self, target_winrate=0.55, lr=0.05):
        self.diff = 0.5
        self.target = target_winrate
        self.lr = lr
    def update(self, won: bool):
        self.diff += self.lr * (int(won) - self.target)
        self.diff = max(0.0, min(1.0, self.diff))

CFG scale schedule (diffusion)

def cfg_schedule(t, total, base=7.5):
    # higher CFG early, taper late (Karras schedule trick)
    progress = t / total
    return base * (1 - 0.3 * progress)

Plateau-based lr reduction

sched = torch.optim.lr_scheduler.ReduceLROnPlateau(
    opt, mode="min", factor=0.5, patience=3, min_lr=1e-6)
for epoch in range(EPOCHS):
    val_loss = evaluate()
    sched.step(val_loss)

매 결정 기준

상황	Approach
매 known schedule	deterministic (cosine, exponential)
매 unknown convergence	adaptive (ReduceLROnPlateau)
매 RL entropy / GAN balance	self-adaptive (learnable param)
매 LLM serving	per-request sampler config
매 game	DDA with target metric

기본값: warmup + cosine for training, temp=0.7/top-p=0.95 for chat LLM.

🔗 Graph

부모: Hyperparameter-Tuning · Optimization
변형: Learning-Rate-Schedule · Dynamic-Difficulty-Adjustment
응용: LLM-Sampling · Reinforcement-Learning · Diffusion-Sampling
Adjacent: CMA-ES · SAC · Bayesian-Optimization

🤖 LLM 활용

언제: 매 inference profile 분기 (creative vs factual), per-stage training 의 schedule 추천. 언제 X: 매 first prototype — fixed default 로 시작, control 은 나중.

❌ 안티패턴

No warmup: 매 large LR 로 시작 → loss spike.
Fixed temp for all tasks: factual 에 0.9, creative 에 0.1 같은 mismatch.
DDA without floor/ceiling: difficulty 가 unbounded drift.
Self-adaptive without target: learnable param 이 collapse (alpha → 0).

🧪 검증 / 중복

Verified (PyTorch 2.5, vLLM 0.6, SAC paper Haarnoja 2018).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — unified ML/inference/game parameter control

5.3 KiB Raw Blame History Unescape Escape