---
id: wiki-2026-0508-optimization
title: Optimization
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Mathematical Optimization, Numerical Optimization]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [optimization, convex, gradient-descent, ml-training]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python
  framework: PyTorch/JAX/CVXPY
---

# Optimization

## 매 한 줄
> **"매 minimize f(x) subject to constraints"**. 매 optimization은 매 ML/OR/control/finance/engineering 의 universal language이며, 매 2026 LLM 학습은 매 AdamW + cosine schedule + grad clip + mixed precision의 매 standard recipe — 매 convexity·smoothness·stochasticity·constraint structure 가 매 algorithm choice를 결정.

## 매 핵심

### 매 분류축
- **Convex vs Nonconvex**: convex → global guarantee; nonconvex (deep nets) → local + heuristics.
- **Smooth vs Nonsmooth**: smooth → gradient; nonsmooth → subgradient / proximal.
- **Constrained vs Unconstrained**: KKT, Lagrangian, projection.
- **Deterministic vs Stochastic**: full grad vs SGD/Adam.
- **First-order vs Second-order**: GD/Adam vs Newton/L-BFGS/K-FAC.

### 매 핵심 이론
- Convexity: f(λx+(1-λ)y) ≤ λf(x)+(1-λ)f(y).
- Lipschitz smoothness: ‖∇f(x)-∇f(y)‖ ≤ L‖x-y‖.
- Strong convexity μ: convergence rate O((1-μ/L)ᵏ).
- KKT conditions: stationarity, primal/dual feasibility, complementary slackness.

### 매 응용
1. ML training (SGD/Adam/Lion/Sophia).
2. LP/MIP (Gurobi, HiGHS).
3. Optimal control (LQR, MPC).
4. Portfolio (Markowitz, Black-Litterman).
5. Hyperparameter tuning (Bayesian opt, Optuna).

## 💻 패턴

### SGD with momentum (PyTorch)
```python
import torch
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR

opt = AdamW(model.parameters(), lr=3e-4, weight_decay=0.1, betas=(0.9, 0.95))
sched = CosineAnnealingLR(opt, T_max=total_steps)
for x, y in loader:
    opt.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    opt.step()
    sched.step()
```

### Convex optimization with CVXPY
```python
import cvxpy as cp
x = cp.Variable(n)
prob = cp.Problem(
    cp.Minimize(cp.sum_squares(A@x - b) + lam*cp.norm1(x)),
    [x >= 0, cp.sum(x) == 1])
prob.solve(solver=cp.MOSEK)
```

### L-BFGS for moderate-scale smooth
```python
from scipy.optimize import minimize
res = minimize(f, x0, jac=grad_f, method='L-BFGS-B',
               bounds=bounds, options={'ftol': 1e-9})
```

### Proximal gradient (FISTA)
```python
def fista(grad_f, prox_g, x0, L, n_iter=200):
    x = y = x0.copy(); t = 1.0
    for k in range(n_iter):
        x_new = prox_g(y - grad_f(y)/L, 1/L)
        t_new = 0.5*(1 + np.sqrt(1 + 4*t*t))
        y = x_new + ((t-1)/t_new)*(x_new - x)
        x, t = x_new, t_new
    return x
```

### Bayesian optimization (Optuna)
```python
import optuna
def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    wd = trial.suggest_float('wd', 1e-4, 1e-1, log=True)
    return train_and_eval(lr, wd)
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
```

### Projected gradient (constraint set)
```python
def proj_simplex(v):
    n = len(v); u = np.sort(v)[::-1]
    cssv = np.cumsum(u) - 1
    rho = np.where(u - cssv/np.arange(1, n+1) > 0)[0][-1]
    theta = cssv[rho] / (rho+1)
    return np.maximum(v - theta, 0)
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Smooth convex, small | Newton / L-BFGS |
| Smooth convex, large | GD / accelerated GD |
| Nonsmooth convex | Subgradient / proximal / ADMM |
| Stochastic, deep net | AdamW (default) / Lion / Sophia |
| LP / QP | Simplex / interior-point (Gurobi/Mosek) |
| Black-box / expensive eval | Bayesian opt (Optuna) |
| Combinatorial | MIP / metaheuristic / CP-SAT |

**기본값**: ML training은 AdamW + cosine; convex은 CVXPY; black-box는 Optuna.

## 🔗 Graph
- 응용: [[Operations-Research]] · [[Optimal-Control-Theory]]
- Adjacent: [[Linear-Algebra-Foundations|Linear-Algebra]]

## 🤖 LLM 활용
**언제**: optimizer recipe selection, hyperparam search prior, KKT/Lagrangian derivation 매 explanation.
**언제 X**: 실제 numerical solving (PyTorch/CVXPY/Gurobi 매 사용).

## ❌ 안티패턴
- **Adam everywhere**: 매 small data / convex problem 매 Adam — 매 SGD or L-BFGS 매 더 좋음.
- **No grad clipping for transformers**: 매 explosion 매 inevitable.
- **Constant LR**: 매 cosine / warmup 매 거의 항상 도움.
- **Local minimum panic**: 매 deep net의 saddle point가 매 진짜 problem (not local min).
- **Convex assumption violation**: 매 nonconvex에 매 convex solver 매 적용 → 매 wrong answer.

## 🧪 검증 / 중복
- Verified (Boyd & Vandenberghe "Convex Optimization", Nocedal & Wright).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full optimization landscape |