d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.0 KiB
6.0 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-pytorch-foundations | PyTorch Foundations | 10_Wiki/Topics | verified | self |
|
none | A | 0.95 | applied |
|
2026-05-10 | pending |
|
PyTorch Foundations
매 한 줄
"매 Tensor + autograd + nn.Module + DataLoader". 2016 Soumith Chintala @ Meta 가 release. 매 NumPy-like + GPU + automatic differentiation. 매 2026 현재 PyTorch 2.x —
torch.compile, FSDP2, MPS backend, torch.func — 가 매 default DL framework.
매 핵심
매 4 pillars
- Tensor: 매 N-d array, GPU/CPU/MPS, autograd-tracked.
- Autograd: 매 reverse-mode AD —
.backward(). - nn.Module: 매 layer + state container.
- DataLoader: 매 batched + parallel data pipeline.
매 device
- CUDA: NVIDIA. 매 production default.
- MPS: Apple Silicon. 매 dev-machine.
- ROCm: AMD. 매 growing.
- XPU: Intel.
매 응용
- Vision (timm, torchvision).
- NLP / LLM (transformers, vLLM 의 backend).
- Diffusion (diffusers).
- RL (cleanrl, torchrl).
- Scientific ML (PINN, geometric DL).
💻 패턴
Tensor basics
import torch
x = torch.randn(3, 4, device="cuda", dtype=torch.float32)
y = torch.arange(12).reshape(3, 4).float().cuda()
z = x @ y.T # matmul
w = x.mean(dim=0) # reduction
print(x.shape, x.dtype, x.device)
Autograd
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x ** 2).sum()
y.backward()
print(x.grad) # tensor([2., 4., 6.])
nn.Module
import torch.nn as nn
class MLP(nn.Module):
def __init__(self, d_in, d_h, d_out):
super().__init__()
self.net = nn.Sequential(
nn.Linear(d_in, d_h), nn.GELU(),
nn.Linear(d_h, d_h), nn.GELU(),
nn.Linear(d_h, d_out),
)
def forward(self, x):
return self.net(x)
model = MLP(784, 256, 10).cuda()
Training loop (canonical)
from torch.utils.data import DataLoader
opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()
loader = DataLoader(dataset, batch_size=128, shuffle=True,
num_workers=4, pin_memory=True)
for epoch in range(10):
for x, y in loader:
x, y = x.cuda(non_blocking=True), y.cuda(non_blocking=True)
opt.zero_grad(set_to_none=True)
logits = model(x)
loss = loss_fn(logits, y)
loss.backward()
opt.step()
torch.compile (2.x default)
# 매 30-50% 속도 향상 의 free.
model = torch.compile(model, mode="reduce-overhead")
# mode: "default" | "reduce-overhead" | "max-autotune"
Mixed precision (bf16 / amp)
from torch.amp import autocast, GradScaler
scaler = GradScaler("cuda")
for x, y in loader:
opt.zero_grad(set_to_none=True)
with autocast(device_type="cuda", dtype=torch.bfloat16):
loss = loss_fn(model(x), y)
scaler.scale(loss).backward()
scaler.step(opt)
scaler.update()
Custom Dataset
from torch.utils.data import Dataset
class CSVDataset(Dataset):
def __init__(self, path, transform=None):
import pandas as pd
self.df = pd.read_csv(path)
self.transform = transform
def __len__(self): return len(self.df)
def __getitem__(self, i):
row = self.df.iloc[i]
x = torch.tensor(row[:-1].values, dtype=torch.float32)
y = torch.tensor(row[-1], dtype=torch.long)
return (self.transform(x), y) if self.transform else (x, y)
Save / load
# state_dict (recommended)
torch.save(model.state_dict(), "model.pt")
model.load_state_dict(torch.load("model.pt", weights_only=True))
# safetensors (preferred for sharing, no pickle RCE)
from safetensors.torch import save_file, load_file
save_file(model.state_dict(), "model.safetensors")
Distributed (FSDP2, 2026 default for large)
import torch.distributed as dist
from torch.distributed.fsdp import FSDPModule, fully_shard
dist.init_process_group("nccl")
model = MLP(...).cuda()
fully_shard(model) # FSDP2 API
torch.func (functional API)
from torch.func import vmap, grad
def loss(params, x, y):
return ((model_fn(params, x) - y) ** 2).mean()
per_sample_grads = vmap(grad(loss), in_dims=(None, 0, 0))(params, X, Y)
매 결정 기준
| 상황 | Approach |
|---|---|
| Single-GPU train | model.cuda() + torch.compile |
| Multi-GPU same node | DDP |
| Model > GPU mem | FSDP2 |
| Apple Silicon dev | MPS backend |
| Inference, llm-scale | vLLM / TensorRT-LLM |
| Quick prototype | Lightning or pure loop |
기본값: PyTorch 2.x + bf16 + torch.compile + AdamW.
🔗 Graph
- 부모: Deep Learning
- 변형: JAX · TensorFlow
- 응용: Transformer_Architecture_and_LLM_Foundations · Diffusion-Models · Reinforcement-Learning
- Adjacent: Lightning · Triton
🤖 LLM 활용
언제: 매 boilerplate training loop, 매 shape debug, 매 custom op skeleton. 언제 X: 매 hot-path numerical code 의 review 없이 trust X. 매 hallucinated API (e.g., 매 wrong autograd custom op).
❌ 안티패턴
zero_grad()없이 backward: 매 grad accumulate 의 silent bug.with torch.no_grad()forget at eval: 매 memory + 매 wrong stat.- CPU↔GPU 의 매 step transfer: 매 PCIe bottleneck. 매 pin_memory + non_blocking.
- In-place op 의 autograd-tracked tensor:
x += 1의 backward 의 break. weights_only=False(default 2.6+): pickle RCE risk. 매 alwaysweights_only=True.- No
set_to_none=True: 매 zero-fill 의 wasteful.
🧪 검증 / 중복
- Verified (pytorch.org docs, PyTorch 2.x release notes).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — PyTorch 2.x foundations canonical. |