Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

5.1 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Optimization

매 한 줄

"매 minimize f(x) subject to constraints". 매 optimization은 매 ML/OR/control/finance/engineering 의 universal language이며, 매 2026 LLM 학습은 매 AdamW + cosine schedule + grad clip + mixed precision의 매 standard recipe — 매 convexity·smoothness·stochasticity·constraint structure 가 매 algorithm choice를 결정.

매 핵심

매 분류축

Convex vs Nonconvex: convex → global guarantee; nonconvex (deep nets) → local + heuristics.
Smooth vs Nonsmooth: smooth → gradient; nonsmooth → subgradient / proximal.
Constrained vs Unconstrained: KKT, Lagrangian, projection.
Deterministic vs Stochastic: full grad vs SGD/Adam.
First-order vs Second-order: GD/Adam vs Newton/L-BFGS/K-FAC.

매 핵심 이론

Convexity: f(λx+(1-λ)y) ≤ λf(x)+(1-λ)f(y).
Lipschitz smoothness: ‖∇f(x)-∇f(y)‖ ≤ L‖x-y‖.
Strong convexity μ: convergence rate O((1-μ/L)ᵏ).
KKT conditions: stationarity, primal/dual feasibility, complementary slackness.

매 응용

ML training (SGD/Adam/Lion/Sophia).
LP/MIP (Gurobi, HiGHS).
Optimal control (LQR, MPC).
Portfolio (Markowitz, Black-Litterman).
Hyperparameter tuning (Bayesian opt, Optuna).

💻 패턴

SGD with momentum (PyTorch)

import torch
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR

opt = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1, betas=(0.9, 0.95))
sched = CosineAnnealingLR(opt, T_max=total_steps)
for x, y in loader:
    opt.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    opt.step()
    sched.step()

Convex optimization with CVXPY

import cvxpy as cp
x = cp.Variable(n)
prob = cp.Problem(
    cp.Minimize(cp.sum_squares(A@x - b) + lam*cp.norm1(x)),
    [x >= 0, cp.sum(x) == 1])
prob.solve(solver=cp.MOSEK)

L-BFGS for moderate-scale smooth

from scipy.optimize import minimize
res = minimize(f, x0, jac=grad_f, method='L-BFGS-B',
               bounds=bounds, options={'ftol': 1e-9})

Proximal gradient (FISTA)

def fista(grad_f, prox_g, x0, L, n_iter=200):
    x = y = x0.copy(); t = 1.0
    for k in range(n_iter):
        x_new = prox_g(y - grad_f(y)/L, 1/L)
        t_new = 0.5*(1 + np.sqrt(1 + 4*t*t))
        y = x_new + ((t-1)/t_new)*(x_new - x)
        x, t = x_new, t_new
    return x

Bayesian optimization (Optuna)

import optuna
def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    wd = trial.suggest_float('wd', 1e-4, 1e-1, log=True)
    return train_and_eval(lr, wd)
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

Projected gradient (constraint set)

def proj_simplex(v):
    n = len(v); u = np.sort(v)[::-1]
    cssv = np.cumsum(u) - 1
    rho = np.where(u - cssv/np.arange(1, n+1) > 0)[0][-1]
    theta = cssv[rho] / (rho+1)
    return np.maximum(v - theta, 0)

매 결정 기준

상황	Approach
Smooth convex, small	Newton / L-BFGS
Smooth convex, large	GD / accelerated GD
Nonsmooth convex	Subgradient / proximal / ADMM
Stochastic, deep net	AdamW (default) / Lion / Sophia
LP / QP	Simplex / interior-point (Gurobi/Mosek)
Black-box / expensive eval	Bayesian opt (Optuna)
Combinatorial	MIP / metaheuristic / CP-SAT

기본값: ML training은 AdamW + cosine; convex은 CVXPY; black-box는 Optuna.

🔗 Graph

부모: Mathematics · Calculus
변형: Convex-Optimization · Nonconvex-Optimization · Stochastic-Optimization
응용: Deep-Learning-Training · Operations-Research · Optimal-Control-Theory
Adjacent: Linear-Algebra · Numerical-Methods

🤖 LLM 활용

언제: optimizer recipe selection, hyperparam search prior, KKT/Lagrangian derivation 매 explanation. 언제 X: 실제 numerical solving (PyTorch/CVXPY/Gurobi 매 사용).

❌ 안티패턴

Adam everywhere: 매 small data / convex problem 매 Adam — 매 SGD or L-BFGS 매 더 좋음.
No grad clipping for transformers: 매 explosion 매 inevitable.
Constant LR: 매 cosine / warmup 매 거의 항상 도움.
Local minimum panic: 매 deep net의 saddle point가 매 진짜 problem (not local min).
Convex assumption violation: 매 nonconvex에 매 convex solver 매 적용 → 매 wrong answer.

🧪 검증 / 중복

Verified (Boyd & Vandenberghe "Convex Optimization", Nocedal & Wright).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — full optimization landscape

5.1 KiB Raw Blame History