"매 multiple objectives / variables 를 동시에 optimize". 매 separate / sequential optimization 보다 매 globally better solution 도달 가능 — 매 cost: 매 higher complexity, 매 risk: 매 conflicting gradients. 매 modern DL (end-to-end training), 매 RL (actor-critic), 매 chip design (DSE) 의 매 핵심.
매 핵심
매 왜 jointly?
Coupling: 매 variables 의 interaction 강 → 매 separate solve 매 suboptimal.
Information sharing: 매 shared representation / gradient → 매 mutual benefit.
End-to-end: 매 pipeline 의 손실 누적 X.
매 challenges
Conflicting gradients: 매 objectives 매 push opposite directions.
Scaling: 매 loss magnitudes 매 mismatched → 매 dominant loss problem.
Local minima: 매 joint landscape 매 더 rugged.
Compute: 매 N variables 매 jointly → search space exponential.
매 응용
Multi-task learning: 매 shared encoder + 매 multiple heads.
Actor-critic RL: 매 policy + value 매 jointly.
HW/SW co-design: 매 chip floorplan + scheduler 매 jointly.
# Chen et al 2018 — 매 dynamic loss weightingclassGradNorm:def__init__(self,n_tasks,alpha=1.5):self.weights=torch.ones(n_tasks,requires_grad=True)self.alpha=alphadefupdate(self,losses,shared_params):# 매 normalize 매 gradient magnitudes across tasksgrads=[torch.autograd.grad(l,shared_params,retain_graph=True)forlinlosses]norms=torch.stack([g[0].norm()forgingrads])target=norms.mean()*(losses/losses.mean())**self.alphagradnorm_loss=(norms-target.detach()).abs().sum()returngradnorm_loss
MGDA (Multi-Gradient Descent)
# Sener & Koltun 2018 — 매 Pareto-optimal direction 찾기importnumpyasnpdefmgda_solver(grads):"""grads: list of gradient vectors per task."""# 매 minimum-norm point in convex hullG=np.stack([g.flatten()forgingrads])# solve min ||sum α_i g_i||² s.t. α≥0, sum α=1fromscipy.optimizeimportminimizedefobj(a):returnnp.linalg.norm(a@G)**2a0=np.ones(len(grads))/len(grads)cons=[{"type":"eq","fun":lambdaa:a.sum()-1}]bnds=[(0,1)]*len(grads)res=minimize(obj,a0,constraints=cons,bounds=bnds)returnres.x# 매 Pareto direction
# 매 multi-objective 의 frontier 발견defpareto_front(solutions):"""solutions: list of (obj1, obj2) tuples (minimize both)."""front=[]forsinsolutions:dominated=any(s2[0]<=s[0]ands2[1]<=s[1]ands2!=sfors2insolutions)ifnotdominated:front.append(s)returnfront
매 결정 기준
상황
Strategy
매 objectives 매 aligned
Weighted sum (simple)
매 objectives 매 conflicting
MGDA / PCGrad
매 magnitude 매 mismatched
GradNorm
매 trade-off 매 explore 필요
Pareto frontier sweep
매 RL actor + critic
Joint PPO/SAC
기본값: Weighted sum 시작 → 매 imbalance 발견시 GradNorm 도입.
언제: 매 loss function design 매 multi-objective, 매 gradient conflict diagnosis, 매 Pareto analysis explanation.
언제 X: 매 single-objective optimization — over-complication.
❌ 안티패턴
Random weight tuning: 매 grid search w/o GradNorm → 매 unstable.
Ignore gradient conflict: 매 cosine(g1,g2) < 0 무시 → 매 destructive interference.
Premature joint: 매 separate pretrain → joint finetune 매 더 좋은 경우 많음.