--- id: wiki-2026-0508-optimization title: Optimization category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Mathematical Optimization, Numerical Optimization] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [optimization, convex, gradient-descent, ml-training] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: PyTorch/JAX/CVXPY --- # Optimization ## 매 한 줄 > **"매 minimize f(x) subject to constraints"**. 매 optimization은 매 ML/OR/control/finance/engineering 의 universal language이며, 매 2026 LLM 학습은 매 AdamW + cosine schedule + grad clip + mixed precision의 매 standard recipe — 매 convexity·smoothness·stochasticity·constraint structure 가 매 algorithm choice를 결정. ## 매 핵심 ### 매 분류축 - **Convex vs Nonconvex**: convex → global guarantee; nonconvex (deep nets) → local + heuristics. - **Smooth vs Nonsmooth**: smooth → gradient; nonsmooth → subgradient / proximal. - **Constrained vs Unconstrained**: KKT, Lagrangian, projection. - **Deterministic vs Stochastic**: full grad vs SGD/Adam. - **First-order vs Second-order**: GD/Adam vs Newton/L-BFGS/K-FAC. ### 매 핵심 이론 - Convexity: f(λx+(1-λ)y) ≤ λf(x)+(1-λ)f(y). - Lipschitz smoothness: ‖∇f(x)-∇f(y)‖ ≤ L‖x-y‖. - Strong convexity μ: convergence rate O((1-μ/L)ᵏ). - KKT conditions: stationarity, primal/dual feasibility, complementary slackness. ### 매 응용 1. ML training (SGD/Adam/Lion/Sophia). 2. LP/MIP (Gurobi, HiGHS). 3. Optimal control (LQR, MPC). 4. Portfolio (Markowitz, Black-Litterman). 5. Hyperparameter tuning (Bayesian opt, Optuna). ## 💻 패턴 ### SGD with momentum (PyTorch) ```python import torch from torch.optim import AdamW from torch.optim.lr_scheduler import CosineAnnealingLR opt = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1, betas=(0.9, 0.95)) sched = CosineAnnealingLR(opt, T_max=total_steps) for x, y in loader: opt.zero_grad() loss = criterion(model(x), y) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) opt.step() sched.step() ``` ### Convex optimization with CVXPY ```python import cvxpy as cp x = cp.Variable(n) prob = cp.Problem( cp.Minimize(cp.sum_squares(A@x - b) + lam*cp.norm1(x)), [x >= 0, cp.sum(x) == 1]) prob.solve(solver=cp.MOSEK) ``` ### L-BFGS for moderate-scale smooth ```python from scipy.optimize import minimize res = minimize(f, x0, jac=grad_f, method='L-BFGS-B', bounds=bounds, options={'ftol': 1e-9}) ``` ### Proximal gradient (FISTA) ```python def fista(grad_f, prox_g, x0, L, n_iter=200): x = y = x0.copy(); t = 1.0 for k in range(n_iter): x_new = prox_g(y - grad_f(y)/L, 1/L) t_new = 0.5*(1 + np.sqrt(1 + 4*t*t)) y = x_new + ((t-1)/t_new)*(x_new - x) x, t = x_new, t_new return x ``` ### Bayesian optimization (Optuna) ```python import optuna def objective(trial): lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True) wd = trial.suggest_float('wd', 1e-4, 1e-1, log=True) return train_and_eval(lr, wd) study = optuna.create_study(direction='minimize') study.optimize(objective, n_trials=100) ``` ### Projected gradient (constraint set) ```python def proj_simplex(v): n = len(v); u = np.sort(v)[::-1] cssv = np.cumsum(u) - 1 rho = np.where(u - cssv/np.arange(1, n+1) > 0)[0][-1] theta = cssv[rho] / (rho+1) return np.maximum(v - theta, 0) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Smooth convex, small | Newton / L-BFGS | | Smooth convex, large | GD / accelerated GD | | Nonsmooth convex | Subgradient / proximal / ADMM | | Stochastic, deep net | AdamW (default) / Lion / Sophia | | LP / QP | Simplex / interior-point (Gurobi/Mosek) | | Black-box / expensive eval | Bayesian opt (Optuna) | | Combinatorial | MIP / metaheuristic / CP-SAT | **기본값**: ML training은 AdamW + cosine; convex은 CVXPY; black-box는 Optuna. ## 🔗 Graph - 응용: [[Operations-Research]] · [[Optimal-Control-Theory]] - Adjacent: [[Linear-Algebra-Foundations|Linear-Algebra]] ## 🤖 LLM 활용 **언제**: optimizer recipe selection, hyperparam search prior, KKT/Lagrangian derivation 매 explanation. **언제 X**: 실제 numerical solving (PyTorch/CVXPY/Gurobi 매 사용). ## ❌ 안티패턴 - **Adam everywhere**: 매 small data / convex problem 매 Adam — 매 SGD or L-BFGS 매 더 좋음. - **No grad clipping for transformers**: 매 explosion 매 inevitable. - **Constant LR**: 매 cosine / warmup 매 거의 항상 도움. - **Local minimum panic**: 매 deep net의 saddle point가 매 진짜 problem (not local min). - **Convex assumption violation**: 매 nonconvex에 매 convex solver 매 적용 → 매 wrong answer. ## 🧪 검증 / 중복 - Verified (Boyd & Vandenberghe "Convex Optimization", Nocedal & Wright). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full optimization landscape |