Files
2nd/10_Wiki/Topics/Other/Joint-Optimization.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.2 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-joint-optimization Joint Optimization 10_Wiki/Topics verified self
Multi-Objective Optimization
Co-Optimization
End-to-End Optimization
none A 0.9 applied
optimization
ML
multi-objective
2026-05-10 pending
language framework
python pytorch-jax

Joint Optimization

매 한 줄

"매 multiple objectives / variables 를 동시에 optimize". 매 separate / sequential optimization 보다 매 globally better solution 도달 가능 — 매 cost: 매 higher complexity, 매 risk: 매 conflicting gradients. 매 modern DL (end-to-end training), 매 RL (actor-critic), 매 chip design (DSE) 의 매 핵심.

매 핵심

매 왜 jointly?

  • Coupling: 매 variables 의 interaction 강 → 매 separate solve 매 suboptimal.
  • Information sharing: 매 shared representation / gradient → 매 mutual benefit.
  • End-to-end: 매 pipeline 의 손실 누적 X.

매 challenges

  • Conflicting gradients: 매 objectives 매 push opposite directions.
  • Scaling: 매 loss magnitudes 매 mismatched → 매 dominant loss problem.
  • Local minima: 매 joint landscape 매 더 rugged.
  • Compute: 매 N variables 매 jointly → search space exponential.

매 응용

  1. Multi-task learning: 매 shared encoder + 매 multiple heads.
  2. Actor-critic RL: 매 policy + value 매 jointly.
  3. HW/SW co-design: 매 chip floorplan + scheduler 매 jointly.
  4. Pareto front: 매 cost vs latency 매 frontier.

💻 패턴

Weighted sum (simplest)

import torch

def joint_loss(pred1, pred2, y1, y2, w=(0.5, 0.5)):
    l1 = torch.nn.functional.cross_entropy(pred1, y1)
    l2 = torch.nn.functional.mse_loss(pred2, y2)
    return w[0] * l1 + w[1] * l2

GradNorm (auto-balance)

# Chen et al 2018 — 매 dynamic loss weighting
class GradNorm:
    def __init__(self, n_tasks, alpha=1.5):
        self.weights = torch.ones(n_tasks, requires_grad=True)
        self.alpha = alpha
    def update(self, losses, shared_params):
        # 매 normalize 매 gradient magnitudes across tasks
        grads = [torch.autograd.grad(l, shared_params, retain_graph=True)
                 for l in losses]
        norms = torch.stack([g[0].norm() for g in grads])
        target = norms.mean() * (losses / losses.mean()) ** self.alpha
        gradnorm_loss = (norms - target.detach()).abs().sum()
        return gradnorm_loss

MGDA (Multi-Gradient Descent)

# Sener & Koltun 2018 — 매 Pareto-optimal direction 찾기
import numpy as np

def mgda_solver(grads):
    """grads: list of gradient vectors per task."""
    # 매 minimum-norm point in convex hull
    G = np.stack([g.flatten() for g in grads])
    # solve min ||sum α_i g_i||² s.t. α≥0, sum α=1
    from scipy.optimize import minimize
    def obj(a): return np.linalg.norm(a @ G) ** 2
    a0 = np.ones(len(grads)) / len(grads)
    cons = [{"type": "eq", "fun": lambda a: a.sum() - 1}]
    bnds = [(0, 1)] * len(grads)
    res = minimize(obj, a0, constraints=cons, bounds=bnds)
    return res.x  # 매 Pareto direction

Actor-critic joint update

# PPO-style joint optimization
def actor_critic_loss(states, actions, advantages, returns, policy, value):
    log_p = policy.log_prob(states, actions)
    actor_loss = -(log_p * advantages).mean()
    critic_loss = (value(states) - returns).pow(2).mean()
    entropy = policy.entropy(states).mean()
    return actor_loss + 0.5 * critic_loss - 0.01 * entropy

Pareto frontier sampling

# 매 multi-objective 의 frontier 발견
def pareto_front(solutions):
    """solutions: list of (obj1, obj2) tuples (minimize both)."""
    front = []
    for s in solutions:
        dominated = any(
            s2[0] <= s[0] and s2[1] <= s[1] and s2 != s
            for s2 in solutions
        )
        if not dominated:
            front.append(s)
    return front

매 결정 기준

상황 Strategy
매 objectives 매 aligned Weighted sum (simple)
매 objectives 매 conflicting MGDA / PCGrad
매 magnitude 매 mismatched GradNorm
매 trade-off 매 explore 필요 Pareto frontier sweep
매 RL actor + critic Joint PPO/SAC

기본값: Weighted sum 시작 → 매 imbalance 발견시 GradNorm 도입.

🔗 Graph

🤖 LLM 활용

언제: 매 loss function design 매 multi-objective, 매 gradient conflict diagnosis, 매 Pareto analysis explanation. 언제 X: 매 single-objective optimization — over-complication.

안티패턴

  • Random weight tuning: 매 grid search w/o GradNorm → 매 unstable.
  • Ignore gradient conflict: 매 cosine(g1,g2) < 0 무시 → 매 destructive interference.
  • Premature joint: 매 separate pretrain → joint finetune 매 더 좋은 경우 많음.

🧪 검증 / 중복

  • Verified (Chen 2018 GradNorm; Sener & Koltun 2018 MGDA; Yu 2020 PCGrad; Schulman 2017 PPO).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — multi-objective optimization patterns + Pareto