--- id: wiki-2026-0508-hyperparameters title: Hyperparameters category: 10_Wiki/Topics status: verified canonical_id: self aliases: [hyperparameters, HPO, learning rate, batch size, AutoML, Optuna, Bayesian opt] duplicate_of: none source_trust_level: A confidence_score: 0.96 verification_status: applied tags: [machine-learning, hyperparameters, hpo, automl, optuna, bayesian-opt] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: Optuna / Ray Tune / Hyperopt / Wandb Sweeps --- # Hyperparameters ## 매 한 줄 > **"매 model 의 의 학습 의 의 의 의 의 외부 parameter"**. 매 learning rate, batch size, depth, regularization. 매 modern HPO: Optuna (Bayesian/TPE), Ray Tune (distributed), Wandb Sweeps. 매 cost vs payoff trade-off — 매 무한정 search 의 X. ## 매 핵심 ### 매 type - **Optimizer**: lr, momentum, weight decay. - **Architecture**: depth, width, head count. - **Regularization**: dropout, label smoothing. - **Training**: batch size, epoch, warmup. - **Data**: augmentation strength. ### 매 search method - **Grid**. - **Random** (Bergstra 2012 — better than grid). - **Bayesian** (TPE, GP). - **Hyperband / ASHA**: 매 early stopping. - **PBT** (Population-based, DeepMind). - **NAS** (Neural Arch Search). ### 매 응용 1. **Tabular ML**: 매 큰 영향. 2. **DL**: 매 medium-size 의 critical. 3. **LLM fine-tune**: 매 lr + LoRA r. ### 매 modern best practice - **Random > grid**. - **Bayesian when expensive**. - **Hyperband** for many configs. - **Log-scale lr**. - **Track everything** (Wandb). - **Cap budget** time/$. ## 💻 패턴 ### Optuna (Bayesian TPE) ```python import optuna def objective(trial): lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True) bs = trial.suggest_categorical('bs', [16, 32, 64, 128]) dropout = trial.suggest_float('dropout', 0.0, 0.5) n_layers = trial.suggest_int('n_layers', 2, 8) model = build(n_layers, dropout) val_loss = train(model, lr, bs) return val_loss study = optuna.create_study(direction='minimize') study.optimize(objective, n_trials=100) print(study.best_params) ``` ### Random search (sklearn) ```python from sklearn.model_selection import RandomizedSearchCV from scipy.stats import loguniform params = {'lr': loguniform(1e-5, 1e-1), 'bs': [16, 32, 64], 'dropout': uniform(0, 0.5)} search = RandomizedSearchCV(model, params, n_iter=50, cv=5) search.fit(X, y) ``` ### Hyperband (ASHA) ```python from ray import tune from ray.tune.schedulers import ASHAScheduler def train_fn(config): for epoch in range(config['epochs']): loss = train_step(config) tune.report(loss=loss) tune.run( train_fn, config={'lr': tune.loguniform(1e-5, 1e-1), 'bs': tune.choice([16, 32, 64])}, scheduler=ASHAScheduler(metric='loss', mode='min'), num_samples=100, ) ``` ### Population-Based Training (PBT) ```python from ray.tune.schedulers import PopulationBasedTraining pbt = PopulationBasedTraining( time_attr='training_iteration', metric='loss', mode='min', perturbation_interval=5, hyperparam_mutations={'lr': lambda: tune.loguniform(1e-5, 1e-1)}, ) ``` ### Wandb Sweeps ```yaml # 매 sweep.yaml program: train.py method: bayes metric: { name: val_loss, goal: minimize } parameters: lr: { min: 1e-5, max: 1e-1, distribution: log_uniform_values } bs: { values: [16, 32, 64, 128] } ``` ```bash wandb sweep sweep.yaml wandb agent ``` ### Default starting points ```python DEFAULTS = { 'transformer': {'lr': 3e-4, 'bs': 32, 'warmup': 4000, 'wd': 0.01}, 'cnn': {'lr': 1e-3, 'bs': 256, 'momentum': 0.9, 'wd': 1e-4}, 'lora': {'lr': 1e-4, 'r': 16, 'alpha': 32, 'dropout': 0.05}, } ``` ### LR finder (Smith) ```python def find_lr(model, train_loader, lr_min=1e-7, lr_max=1): lrs = np.geomspace(lr_min, lr_max, 100) losses = [] for lr in lrs: for p in model.optimizer.param_groups: p['lr'] = lr loss = train_one_batch(next(iter(train_loader))) losses.append(loss) # 매 plot lrs vs losses → pick before divergence return lrs, losses ``` ### LoRA hyperparameter ```python from peft import LoraConfig config = LoraConfig( r=trial.suggest_categorical('r', [8, 16, 32, 64]), lora_alpha=trial.suggest_categorical('alpha', [16, 32, 64, 128]), lora_dropout=trial.suggest_float('dropout', 0, 0.2), ) ``` ### Cost cap ```python import time class BudgetedStudy: def __init__(self, budget_hours=4): self.start = time.time() self.budget = budget_hours * 3600 def should_continue(self): return time.time() - self.start < self.budget ``` ### Early stopping per trial ```python def objective_with_pruning(trial): for epoch in range(50): loss = train_step() trial.report(loss, epoch) if trial.should_prune(): raise optuna.TrialPruned() return loss ``` ### Track best config (Wandb) ```python import wandb wandb.init(project='hpo', config=trial.params) for epoch in range(epochs): wandb.log({'loss': loss, 'lr': lr, 'epoch': epoch}) ``` ### NAS-Bench ```python # 매 NAS-Bench-101/201 — pre-computed architectures import nasbench nasbench_model = nasbench.NASBench('nasbench_only108.tfrecord') arch = sample_architecture() metrics = nasbench_model.query(arch) ``` ## 매 결정 기준 | 상황 | Method | |---|---| | < 100 trials | Random | | Expensive trial | Bayesian (Optuna) | | Many configs | Hyperband / ASHA | | Long-running | PBT | | Default start | Architecture-specific defaults | | Tight budget | LR finder + few trials | **기본값**: 매 Optuna TPE + Hyperband prune + Wandb track + log-scale lr + budget cap. 매 cost-aware. ## 🔗 Graph - 부모: [[Machine-Learning]] · [[AutoML]] - 변형: [[Hyperparameters|Hyperparameter-Optimization]] · [[NAS]] - 응용: [[Optuna]] - Adjacent: [[Bayesian-Optimization]] · [[Gaussian-Processes]] · [[Fine-tuning]] ## 🤖 LLM 활용 **언제**: 매 production model. 매 fine-tune. 매 architecture sweep. **언제 X**: 매 throwaway / quick PoC. ## ❌ 안티패턴 - **Grid for many params**: 매 exponential cost. - **No log-scale lr**: 매 wasteful. - **Ignore early stopping**: 매 budget waste. - **No baseline**: 매 HPO 의 worth 의 invisible. - **Test set leak**: 매 HPO with test. ## 🧪 검증 / 중복 - Verified (Bergstra 2012, Optuna docs, Ray Tune docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — methods + 매 Optuna / Ray / Wandb / LR finder code |