---
id: wiki-2026-0508-determinism-in-computing
title: Determinism in Computing
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Reproducibility, Bit-Exact, 결정론적 실행]
duplicate_of: none
source_trust_level: A
confidence_score: 0.93
verification_status: applied
tags: [determinism, reproducibility, concurrency, ML, distributed-systems]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: pytorch/cuda
---

# Determinism in Computing

## 매 한 줄
> **"매 same input + same code = same output, every run, every machine"**. 매 1936 Turing 의 deterministic state machine 부터 매 2026 ML training 의 bit-exact reproducibility, 매 distributed consensus (Raft), 매 blockchain virtual machines 까지 — 매 trust 와 debugging 의 foundation.

## 매 핵심

### 매 등급
- **Bit-exact**: 매 byte-level identical output. 매 cryptographic hash 동일.
- **Numerically reproducible**: 매 within ε tolerance — 매 floating-point order 차이.
- **Statistically reproducible**: 매 same distribution, different sample (RNG seed only).
- **Behaviorally reproducible**: 매 high-level outcome 동일 (test passes/fails 동일).

### 매 nondeterminism 원인
- **FP non-associativity**: 매 (a+b)+c ≠ a+(b+c) — 매 reduction order matter.
- **GPU atomic ops**: 매 CUDA atomicAdd 의 ordering 비결정적.
- **Thread scheduling**: 매 OS scheduler 의 race condition.
- **Hash randomization**: 매 Python `PYTHONHASHSEED`, Go map iteration.
- **Wall-clock dependency**: 매 timestamps, `time.time()`, `random()`.
- **Hardware**: 매 cosmic ray bit flips, TLB/cache state.

### 매 응용
1. **ML training reproduction**: 매 paper benchmark 의 reproducibility crisis.
2. **Blockchain consensus**: 매 nodes must reach identical state.
3. **Distributed log replay**: 매 event sourcing 의 deterministic projection.
4. **Game engine replays**: 매 lockstep multiplayer (RTS, fighting games).

## 💻 패턴

### PyTorch Bit-Exact Setup
```python
import torch, random, numpy as np, os

def set_full_determinism(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'  # CUDA 10.2+
    random.seed(seed); np.random.seed(seed)
    torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True, warn_only=False)

set_full_determinism()
```

### Deterministic DataLoader
```python
def seed_worker(worker_id):
    s = torch.initial_seed() % 2**32
    np.random.seed(s); random.seed(s)

g = torch.Generator(); g.manual_seed(42)
loader = DataLoader(ds, batch_size=64, shuffle=True,
                    num_workers=4, worker_init_fn=seed_worker, generator=g)
```

### Lockstep Game Loop (Fixed-Point Math)
```rust
// All clients run identical sim → only inputs synchronized.
const FIXED_DT: Fixed<i64, 16> = Fixed::from_num(1.0 / 60.0);

fn tick(state: &mut GameState, inputs: &[Input]) {
    for input in inputs.iter().sorted_by_key(|i| i.player_id) {
        state.apply(input, FIXED_DT);  // fixed-point, no f32!
    }
    state.tick += 1;
}
```

### Content-Addressable Build (Bazel-style)
```python
def build_artifact(sources, deps, command):
    h = hashlib.sha256()
    for src in sorted(sources):
        h.update(open(src, 'rb').read())
    for d in sorted(deps): h.update(d.hash.encode())
    h.update(command.encode())
    cache_key = h.hexdigest()
    if cache_key in cache: return cache[cache_key]
    return run_and_cache(command, cache_key)
```

### Deterministic Hash for Sets
```python
# Avoid Python set iteration order
def stable_hash_set(items):
    return hashlib.sha256(
        b'\n'.join(sorted(repr(x).encode() for x in items))
    ).hexdigest()
```

### Replay Test
```python
def test_replay_is_deterministic():
    seed = 12345
    out1 = run_simulation(seed)
    out2 = run_simulation(seed)
    assert out1 == out2, "Nondeterminism detected!"
    # for ML: torch.testing.assert_close(out1, out2, atol=0, rtol=0)
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| ML reproducibility paper | bit-exact (CUBLAS config + cudnn.deterministic) |
| Distributed sim / lockstep | fixed-point arithmetic |
| Build system | content-addressable hashing |
| Statistical study | seed-only (statistical determinism) |
| Performance critical | relax to "numerically close" |

**기본값**: 매 seed everything + log seeds in artifacts metadata.

## 🔗 Graph
- 부모: [[Theoretical-Computer-Science]] · [[Reproducibility]]
- Adjacent: [[Idempotency]]

## 🤖 LLM 활용
**언제**: 매 evaluation harness, 매 regression test 의 ground truth, 매 paper code release.
**언제 X**: 매 LLM sampling 자체 (temperature > 0) — 매 inherently nondeterministic; 매 fixed seed + temperature=0 만 reproducible.

## ❌ 안티패턴
- **Forgetting CUBLAS_WORKSPACE_CONFIG**: 매 CUDA matmul 비결정적, training 결과 매 run 다름.
- **Using `set()` in pipeline**: 매 Python <3.7 dict order 비결정적.
- **Wall-clock as seed**: 매 reproducibility 불가, debugging 불가.
- **Mixing CPU/GPU reductions**: 매 sum order 차이로 ε divergence 누적.
- **Ignoring hardware drift**: 매 different GPU arch (A100 vs H100) → different results 가능.

## 🧪 검증 / 중복
- Verified (PyTorch reproducibility docs 2026; Raft paper 2014; Bazel hermetic build docs).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content with PyTorch, lockstep, build patterns |