Files
2nd/10_Wiki/Topics/Computer_Science_and_Theory/Optimization.md
T
2026-05-10 22:08:15 +09:00

5.1 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-optimization Optimization 10_Wiki/Topics verified self
Mathematical Optimization
Numerical Optimization
none A 0.9 applied
optimization
convex
gradient-descent
ml-training
2026-05-10 pending
language framework
Python PyTorch/JAX/CVXPY

Optimization

매 한 줄

"매 minimize f(x) subject to constraints". 매 optimization은 매 ML/OR/control/finance/engineering 의 universal language이며, 매 2026 LLM 학습은 매 AdamW + cosine schedule + grad clip + mixed precision의 매 standard recipe — 매 convexity·smoothness·stochasticity·constraint structure 가 매 algorithm choice를 결정.

매 핵심

매 분류축

  • Convex vs Nonconvex: convex → global guarantee; nonconvex (deep nets) → local + heuristics.
  • Smooth vs Nonsmooth: smooth → gradient; nonsmooth → subgradient / proximal.
  • Constrained vs Unconstrained: KKT, Lagrangian, projection.
  • Deterministic vs Stochastic: full grad vs SGD/Adam.
  • First-order vs Second-order: GD/Adam vs Newton/L-BFGS/K-FAC.

매 핵심 이론

  • Convexity: f(λx+(1-λ)y) ≤ λf(x)+(1-λ)f(y).
  • Lipschitz smoothness: ‖∇f(x)-∇f(y)‖ ≤ L‖x-y‖.
  • Strong convexity μ: convergence rate O((1-μ/L)ᵏ).
  • KKT conditions: stationarity, primal/dual feasibility, complementary slackness.

매 응용

  1. ML training (SGD/Adam/Lion/Sophia).
  2. LP/MIP (Gurobi, HiGHS).
  3. Optimal control (LQR, MPC).
  4. Portfolio (Markowitz, Black-Litterman).
  5. Hyperparameter tuning (Bayesian opt, Optuna).

💻 패턴

SGD with momentum (PyTorch)

import torch
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR

opt = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1, betas=(0.9, 0.95))
sched = CosineAnnealingLR(opt, T_max=total_steps)
for x, y in loader:
    opt.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    opt.step()
    sched.step()

Convex optimization with CVXPY

import cvxpy as cp
x = cp.Variable(n)
prob = cp.Problem(
    cp.Minimize(cp.sum_squares(A@x - b) + lam*cp.norm1(x)),
    [x >= 0, cp.sum(x) == 1])
prob.solve(solver=cp.MOSEK)

L-BFGS for moderate-scale smooth

from scipy.optimize import minimize
res = minimize(f, x0, jac=grad_f, method='L-BFGS-B',
               bounds=bounds, options={'ftol': 1e-9})

Proximal gradient (FISTA)

def fista(grad_f, prox_g, x0, L, n_iter=200):
    x = y = x0.copy(); t = 1.0
    for k in range(n_iter):
        x_new = prox_g(y - grad_f(y)/L, 1/L)
        t_new = 0.5*(1 + np.sqrt(1 + 4*t*t))
        y = x_new + ((t-1)/t_new)*(x_new - x)
        x, t = x_new, t_new
    return x

Bayesian optimization (Optuna)

import optuna
def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    wd = trial.suggest_float('wd', 1e-4, 1e-1, log=True)
    return train_and_eval(lr, wd)
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

Projected gradient (constraint set)

def proj_simplex(v):
    n = len(v); u = np.sort(v)[::-1]
    cssv = np.cumsum(u) - 1
    rho = np.where(u - cssv/np.arange(1, n+1) > 0)[0][-1]
    theta = cssv[rho] / (rho+1)
    return np.maximum(v - theta, 0)

매 결정 기준

상황 Approach
Smooth convex, small Newton / L-BFGS
Smooth convex, large GD / accelerated GD
Nonsmooth convex Subgradient / proximal / ADMM
Stochastic, deep net AdamW (default) / Lion / Sophia
LP / QP Simplex / interior-point (Gurobi/Mosek)
Black-box / expensive eval Bayesian opt (Optuna)
Combinatorial MIP / metaheuristic / CP-SAT

기본값: ML training은 AdamW + cosine; convex은 CVXPY; black-box는 Optuna.

🔗 Graph

🤖 LLM 활용

언제: optimizer recipe selection, hyperparam search prior, KKT/Lagrangian derivation 매 explanation. 언제 X: 실제 numerical solving (PyTorch/CVXPY/Gurobi 매 사용).

안티패턴

  • Adam everywhere: 매 small data / convex problem 매 Adam — 매 SGD or L-BFGS 매 더 좋음.
  • No grad clipping for transformers: 매 explosion 매 inevitable.
  • Constant LR: 매 cosine / warmup 매 거의 항상 도움.
  • Local minimum panic: 매 deep net의 saddle point가 매 진짜 problem (not local min).
  • Convex assumption violation: 매 nonconvex에 매 convex solver 매 적용 → 매 wrong answer.

🧪 검증 / 중복

  • Verified (Boyd & Vandenberghe "Convex Optimization", Nocedal & Wright).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full optimization landscape