--- id: wiki-2026-0508-seed title: Seed category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Random Seed, RNG Seed, Reproducibility Seed] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [reproducibility, random, ml-training, image-gen, determinism] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: PyTorch --- # Seed ## 매 한 줄 > **"매 seed 는 reproducibility 의 anchor — 매 same seed + same code + same hardware → same result"**. 매 origin 은 von Neumann 1949 mid-square method, 매 modern state 는 ML training (PyTorch, JAX), image gen (Stable Diffusion, FLUX 의 seed lock), 그리고 매 paper reproducibility 의 standard practice. ## 매 핵심 ### 매 seed 가 영향 주는 곳 - **Data shuffling**: DataLoader sampler order. - **Weight init**: Xavier/He 의 random. - **Augmentation**: random crop/flip/color. - **Dropout / BatchNorm noise**: training 시 stochastic. - **Image gen**: latent noise (z) sampling. - **MC simulation**: Monte Carlo sample order. ### 매 hardware non-determinism (매 seed 의 한계) - **CUDA atomics**: scatter_add 등 floating-point atomic 의 비결정적 order. - **cuDNN heuristic**: convolution 의 algo 선택. - **TF32 / mixed precision**: FP rounding 차이. - **Multi-GPU all-reduce**: NCCL ring order. - → 매 seed 만으로 부족, `deterministic=True` 필요. ### 매 응용 1. ML training reproducibility (paper). 2. Image gen 의 seed lock (consistent character, A/B test). 3. Statistical simulation (bootstrap, MC). 4. Bug reproduction (flake → 매 seed pin). ## 💻 패턴 ### 매 PyTorch full reproducibility (2026) ```python import os, random import numpy as np import torch def seed_everything(seed: int = 42): os.environ["PYTHONHASHSEED"] = str(seed) os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" # 매 cublas 결정적 random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) # 매 cuDNN 결정적 (매 속도 trade-off) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False # 매 PyTorch 2.x deterministic algorithms torch.use_deterministic_algorithms(True, warn_only=True) seed_everything(42) ``` ### 매 DataLoader seed (매 worker 마다 다른 seed) ```python def worker_init_fn(worker_id): seed = torch.initial_seed() % 2**32 np.random.seed(seed + worker_id) random.seed(seed + worker_id) g = torch.Generator() g.manual_seed(42) loader = torch.utils.data.DataLoader( dataset, batch_size=32, shuffle=True, num_workers=4, worker_init_fn=worker_init_fn, generator=g, ) ``` ### 매 Stable Diffusion / FLUX 의 seed lock ```python import torch from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16 ).to("cuda") prompt = "A cyberpunk samurai at neon market, 4k photo" # 매 same seed → same image (same hardware) gen = torch.Generator(device="cuda").manual_seed(20260510) img = pipe(prompt, generator=gen, num_inference_steps=28).images[0] img.save("samurai_seed20260510.png") # 매 seed sweep — 매 character consistency 찾기 for s in range(1000, 1010): g = torch.Generator(device="cuda").manual_seed(s) pipe(prompt, generator=g).images[0].save(f"sweep_{s}.png") ``` ### 매 JAX (functional seed, 매 split) ```python import jax import jax.numpy as jnp key = jax.random.PRNGKey(42) key, subkey1, subkey2 = jax.random.split(key, 3) x = jax.random.normal(subkey1, (1000, 128)) y = jax.random.normal(subkey2, (1000,)) # 매 매 functional — 매 implicit global state X # 매 same key chain → exact same numbers ``` ### 매 numpy 의 새 generator API (post-1.17) ```python import numpy as np # 매 legacy (매 global, 매 thread-unsafe) np.random.seed(42); np.random.randn(3) # 매 권장 X (in 2026) # 매 modern: explicit Generator rng = np.random.default_rng(seed=42) rng.standard_normal(3) # array([ 0.30471708, -1.03998411, ...]) rng.choice([1,2,3], size=10) ``` ### 매 JS (web 의 seedable, Math.random 은 X) ```js // 매 seedrandom (매 V8 Math.random 은 seedable X) import seedrandom from "seedrandom"; const rng = seedrandom("2026-05-10"); console.log(rng()); // 매 deterministic console.log(rng.int32()); // 매 deterministic int ``` ### 매 reproducibility checklist (매 paper / experiment) ```python # 매 매 run 시작 시 dump: import torch, sys, json, hashlib manifest = { "seed": 42, "python": sys.version, "torch": torch.__version__, "cuda": torch.version.cuda, "cudnn": torch.backends.cudnn.version(), "gpu": torch.cuda.get_device_name(0) if torch.cuda.is_available() else None, "code_sha": _git_sha(), "data_sha": hashlib.sha256(open("data.bin","rb").read()).hexdigest(), "hyperparams": {"lr": 3e-4, "batch": 64, "epochs": 30}, } with open("run_manifest.json","w") as f: json.dump(manifest, f, indent=2) ``` ### 매 multi-seed eval (매 paper 의 robust 결과) ```python results = [] for seed in [42, 123, 2024, 31337, 7]: seed_everything(seed) model = train() acc = evaluate(model) results.append(acc) # 매 report mean ± std (NOT single-seed best) print(f"Acc = {np.mean(results):.3f} ± {np.std(results):.3f} (n=5 seeds)") # 매 매 single-seed claim 은 매 reviewer 가 reject. ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 paper experiment | seed_everything + multi-seed (≥3) + manifest dump | | 매 image gen consistency | seed lock + sweep | | 매 prod ML training | seed + log, 매 deterministic 의 perf cost 고려 | | 매 hyperparam sweep | seed pin per run, vary hyperparam | | 매 MC simulation | seed log per run, 매 reproducible | **기본값**: `seed_everything(42)` + manifest JSON + 매 paper claim 매 multi-seed mean±std. ## 🔗 Graph - 부모: [[Reproducibility]] - 응용: [[Monte Carlo]] ## 🤖 LLM 활용 **언제**: 매 LLM의 `seed` param (OpenAI 의 `seed` arg, Anthropic 의 `temperature=0` 근사) — 매 partial reproducibility. 매 prompt 의 deterministic eval. **언제 X**: 매 LLM 은 매 fully reproducible X (provider routing, kernel non-determinism). 매 expectation 조정. ## ❌ 안티패턴 - **Single-seed paper**: 매 매 result fragility. 매 N≥3 seed report. - **Seed pin without manifest**: 매 hardware/lib 변경 시 깨짐. - **Forget DataLoader workers**: 매 worker 의 random 따로 — 매 worker_init_fn 필요. - **`np.random.seed` global**: 매 thread-unsafe — 매 `default_rng` 사용. - **Determinism off-by-default**: 매 cuDNN benchmark=True 면 매 결과 다름. ## 🧪 검증 / 중복 - Verified (PyTorch reproducibility docs 2026, JAX PRNG design notes, Pineau "ML Reproducibility Checklist" NeurIPS). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — PyTorch + JAX + FLUX seed + multi-seed eval |