--- id: wiki-2026-0508-pytorch-foundations title: PyTorch Foundations category: 10_Wiki/Topics status: verified canonical_id: self aliases: [PyTorch Basics, PyTorch Core, torch fundamentals] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [pytorch, deep-learning, tensors, autograd] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: PyTorch-2.x --- # PyTorch Foundations ## 매 한 줄 > **"매 Tensor + autograd + nn.Module + DataLoader"**. 2016 Soumith Chintala @ Meta 가 release. 매 NumPy-like + GPU + automatic differentiation. 매 2026 현재 PyTorch 2.x — `torch.compile`, FSDP2, MPS backend, torch.func — 가 매 default DL framework. ## 매 핵심 ### 매 4 pillars - **Tensor**: 매 N-d array, GPU/CPU/MPS, autograd-tracked. - **Autograd**: 매 reverse-mode AD — `.backward()`. - **nn.Module**: 매 layer + state container. - **DataLoader**: 매 batched + parallel data pipeline. ### 매 device - **CUDA**: NVIDIA. 매 production default. - **MPS**: Apple Silicon. 매 dev-machine. - **ROCm**: AMD. 매 growing. - **XPU**: Intel. ### 매 응용 1. Vision (timm, torchvision). 2. NLP / LLM (transformers, vLLM 의 backend). 3. Diffusion (diffusers). 4. RL (cleanrl, torchrl). 5. Scientific ML (PINN, geometric DL). ## 💻 패턴 ### Tensor basics ```python import torch x = torch.randn(3, 4, device="cuda", dtype=torch.float32) y = torch.arange(12).reshape(3, 4).float().cuda() z = x @ y.T # matmul w = x.mean(dim=0) # reduction print(x.shape, x.dtype, x.device) ``` ### Autograd ```python x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) y = (x ** 2).sum() y.backward() print(x.grad) # tensor([2., 4., 6.]) ``` ### nn.Module ```python import torch.nn as nn class MLP(nn.Module): def __init__(self, d_in, d_h, d_out): super().__init__() self.net = nn.Sequential( nn.Linear(d_in, d_h), nn.GELU(), nn.Linear(d_h, d_h), nn.GELU(), nn.Linear(d_h, d_out), ) def forward(self, x): return self.net(x) model = MLP(784, 256, 10).cuda() ``` ### Training loop (canonical) ```python from torch.utils.data import DataLoader opt = torch.optim.AdamW(model.parameters(), lr=3e-4) loss_fn = nn.CrossEntropyLoss() loader = DataLoader(dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True) for epoch in range(10): for x, y in loader: x, y = x.cuda(non_blocking=True), y.cuda(non_blocking=True) opt.zero_grad(set_to_none=True) logits = model(x) loss = loss_fn(logits, y) loss.backward() opt.step() ``` ### torch.compile (2.x default) ```python # 매 30-50% 속도 향상 의 free. model = torch.compile(model, mode="reduce-overhead") # mode: "default" | "reduce-overhead" | "max-autotune" ``` ### Mixed precision (bf16 / amp) ```python from torch.amp import autocast, GradScaler scaler = GradScaler("cuda") for x, y in loader: opt.zero_grad(set_to_none=True) with autocast(device_type="cuda", dtype=torch.bfloat16): loss = loss_fn(model(x), y) scaler.scale(loss).backward() scaler.step(opt) scaler.update() ``` ### Custom Dataset ```python from torch.utils.data import Dataset class CSVDataset(Dataset): def __init__(self, path, transform=None): import pandas as pd self.df = pd.read_csv(path) self.transform = transform def __len__(self): return len(self.df) def __getitem__(self, i): row = self.df.iloc[i] x = torch.tensor(row[:-1].values, dtype=torch.float32) y = torch.tensor(row[-1], dtype=torch.long) return (self.transform(x), y) if self.transform else (x, y) ``` ### Save / load ```python # state_dict (recommended) torch.save(model.state_dict(), "model.pt") model.load_state_dict(torch.load("model.pt", weights_only=True)) # safetensors (preferred for sharing, no pickle RCE) from safetensors.torch import save_file, load_file save_file(model.state_dict(), "model.safetensors") ``` ### Distributed (FSDP2, 2026 default for large) ```python import torch.distributed as dist from torch.distributed.fsdp import FSDPModule, fully_shard dist.init_process_group("nccl") model = MLP(...).cuda() fully_shard(model) # FSDP2 API ``` ### torch.func (functional API) ```python from torch.func import vmap, grad def loss(params, x, y): return ((model_fn(params, x) - y) ** 2).mean() per_sample_grads = vmap(grad(loss), in_dims=(None, 0, 0))(params, X, Y) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Single-GPU train | `model.cuda()` + `torch.compile` | | Multi-GPU same node | DDP | | Model > GPU mem | **FSDP2** | | Apple Silicon dev | MPS backend | | Inference, llm-scale | vLLM / TensorRT-LLM | | Quick prototype | Lightning or pure loop | **기본값**: PyTorch 2.x + bf16 + torch.compile + AdamW. ## 🔗 Graph - 부모: [[Deep-Learning]] - 변형: [[JAX]] · [[TensorFlow]] - 응용: [[Transformer_Architecture_and_LLM_Foundations|Transformers]] · [[Diffusion-Models]] · [[Reinforcement-Learning]] - Adjacent: [[Lightning]] · [[Triton]] ## 🤖 LLM 활용 **언제**: 매 boilerplate training loop, 매 shape debug, 매 custom op skeleton. **언제 X**: 매 hot-path numerical code 의 review 없이 trust X. 매 hallucinated API (e.g., 매 wrong autograd custom op). ## ❌ 안티패턴 - **`zero_grad()` 없이 backward**: 매 grad accumulate 의 silent bug. - **`with torch.no_grad()` forget at eval**: 매 memory + 매 wrong stat. - **CPU↔GPU 의 매 step transfer**: 매 PCIe bottleneck. 매 pin_memory + non_blocking. - **In-place op 의 autograd-tracked tensor**: `x += 1` 의 backward 의 break. - **`weights_only=False` (default 2.6+)**: pickle RCE risk. 매 always `weights_only=True`. - **No `set_to_none=True`**: 매 zero-fill 의 wasteful. ## 🧪 검증 / 중복 - Verified (pytorch.org docs, PyTorch 2.x release notes). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — PyTorch 2.x foundations canonical. |