Optuna / scikit-optimize / Ray Tune / BoTorch / nevergrad
Black-Box Optimization
📌 한 줄 통찰
"매 gradient X 의 best 의 search". 매 expensive function (1 trial = hour) 의 minimum sample 의 best. 매 hyperparameter / drug / robotics / circuit design 의 standard. 매 Bayesian Optimization (GP) 의 dominant.
📖 핵심
매 setting
매 f(x): 매 expensive (분 ~ 일).
매 gradient X 또는 매 noisy.
매 budget 매 limited (10-1000 trial).
매 goal: min/max f.
매 method
Random / Grid search
매 simple, 매 baseline.
매 random > grid (high-dim).
Bayesian Optimization (BO)
매 surrogate model (Gaussian Process / TPE) 의 fit.
매 acquisition function (EI, UCB, PI) 의 next 결정.
✅ 매 sample-efficient.
❌ 매 GP scale O(N³).
Evolutionary
CMA-ES: 매 covariance matrix adaptation. 매 continuous.
GA: 매 discrete.
Differential Evolution: 매 robust.
Simulated Annealing
매 random walk + 매 cooling schedule.
매 escape local min.
Population-based
Particle Swarm (PSO).
Population-Based Training (PBT, DeepMind).
TPE (Tree-structured Parzen Estimator)
매 Optuna default.
매 conditional parameter OK.
NES (Natural Evolution Strategy)
매 OpenAI ES.
매 distributed-friendly.
매 acquisition function (BO)
Expected Improvement (EI): 매 expected gain over best.
UCB (Upper Confidence Bound): 매 exploit + explore (κ).
PI (Probability of Improvement): 매 simple.
TS (Thompson Sampling): 매 sample posterior.
q-EI: 매 batch parallel.
매 응용
Hyperparameter tune: 매 Optuna, 매 Ray Tune.
AutoML: 매 architecture + hyperparam.
Drug discovery: 매 molecule design.
Robotics: 매 policy parameter.
A/B test: 매 thompson sampling.
Material design: 매 alloy composition.
Compiler: 매 optimization flag.
NN architecture search: NAS.
매 high-dim / structured
Trust Region BO: 매 local search.
Multi-fidelity: 매 cheap proxy.
Constraint BO: 매 feasibility constraint.
Multi-objective: 매 Pareto front.
Categorical / mixed: 매 SMAC, 매 TPE.
매 modern compute
Parallel batch: 매 q-acquisition.
Async: 매 worker 의 done 의 즉시 propose.
Warm-start: 매 prior task 의 transfer.
Multi-fidelity (Hyperband, BOHB): 매 budget allocation.
importcmadefobjective(x):returnsum(xi**2forxiinx)# 매 minimizees=cma.CMAEvolutionStrategy(x0=[1.0]*10,sigma0=0.5)es.optimize(objective,iterations=100)print(es.result.xbest)
BoTorch (PyTorch BO)
importtorchfrombotorch.modelsimportSingleTaskGPfrombotorch.fitimportfit_gpytorch_mllfrombotorch.acquisitionimportExpectedImprovementfrombotorch.optimimportoptimize_acqffromgpytorch.mllsimportExactMarginalLogLikelihood# 매 X, Y 의 train datagp=SingleTaskGP(X,Y)mll=ExactMarginalLogLikelihood(gp.likelihood,gp)fit_gpytorch_mll(mll)ei=ExpectedImprovement(model=gp,best_f=Y.max())candidate,_=optimize_acqf(ei,bounds=bounds,q=1,num_restarts=10,raw_samples=512,)# 매 candidate 의 evaluate → 매 GP 의 update.
→ 매 cheap (low epoch) 의 explore + 매 promising 의 더 exploit.
Multi-objective (Pareto)
importoptunadefobjective(trial):x=trial.suggest_float('x',0,5)y=trial.suggest_float('y',0,5)returnx**2,(x-2)**2+y**2# 매 둘 다 minimizestudy=optuna.create_study(directions=['minimize','minimize'])study.optimize(objective,n_trials=100)# 매 Pareto front 의 visualize.optuna.visualization.plot_pareto_front(study).show()
언제: 매 expensive function. 매 hyperparameter tune. 매 gradient 없는 system. 매 design space search.
언제 X: 매 cheap function (gradient 더 fast). 매 closed-form solution.
❌ 안티패턴
Grid search high-dim: 매 curse of dimensionality.
Acquisition 의 always EI (high-noise): 매 UCB 가 좋음.
No warm-start (related task): 매 sample waste.
GP 의 1000+ trial: 매 cubic scale.
No multi-fidelity (cheap proxy 가능): 매 budget waste.
Single objective (multi-criteria 의 case): 매 weight 의 wrong.
🧪 검증 / 중복
Verified (Snoek et al. BO, Hansen CMA-ES, Optuna paper).