Files
2nd/10_Wiki/Topics/AI_and_ML/PyTorch-Foundations.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

6.0 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-pytorch-foundations PyTorch Foundations 10_Wiki/Topics verified self
PyTorch Basics
PyTorch Core
torch fundamentals
none A 0.95 applied
pytorch
deep-learning
tensors
autograd
2026-05-10 pending
language framework
Python PyTorch-2.x

PyTorch Foundations

매 한 줄

"매 Tensor + autograd + nn.Module + DataLoader". 2016 Soumith Chintala @ Meta 가 release. 매 NumPy-like + GPU + automatic differentiation. 매 2026 현재 PyTorch 2.x — torch.compile, FSDP2, MPS backend, torch.func — 가 매 default DL framework.

매 핵심

매 4 pillars

  • Tensor: 매 N-d array, GPU/CPU/MPS, autograd-tracked.
  • Autograd: 매 reverse-mode AD — .backward().
  • nn.Module: 매 layer + state container.
  • DataLoader: 매 batched + parallel data pipeline.

매 device

  • CUDA: NVIDIA. 매 production default.
  • MPS: Apple Silicon. 매 dev-machine.
  • ROCm: AMD. 매 growing.
  • XPU: Intel.

매 응용

  1. Vision (timm, torchvision).
  2. NLP / LLM (transformers, vLLM 의 backend).
  3. Diffusion (diffusers).
  4. RL (cleanrl, torchrl).
  5. Scientific ML (PINN, geometric DL).

💻 패턴

Tensor basics

import torch

x = torch.randn(3, 4, device="cuda", dtype=torch.float32)
y = torch.arange(12).reshape(3, 4).float().cuda()

z = x @ y.T          # matmul
w = x.mean(dim=0)    # reduction
print(x.shape, x.dtype, x.device)

Autograd

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x ** 2).sum()
y.backward()
print(x.grad)  # tensor([2., 4., 6.])

nn.Module

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, d_in, d_h, d_out):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_in, d_h), nn.GELU(),
            nn.Linear(d_h, d_h), nn.GELU(),
            nn.Linear(d_h, d_out),
        )
    def forward(self, x):
        return self.net(x)

model = MLP(784, 256, 10).cuda()

Training loop (canonical)

from torch.utils.data import DataLoader

opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()
loader = DataLoader(dataset, batch_size=128, shuffle=True,
                    num_workers=4, pin_memory=True)

for epoch in range(10):
    for x, y in loader:
        x, y = x.cuda(non_blocking=True), y.cuda(non_blocking=True)
        opt.zero_grad(set_to_none=True)
        logits = model(x)
        loss = loss_fn(logits, y)
        loss.backward()
        opt.step()

torch.compile (2.x default)

# 매 30-50% 속도 향상 의 free.
model = torch.compile(model, mode="reduce-overhead")
# mode: "default" | "reduce-overhead" | "max-autotune"

Mixed precision (bf16 / amp)

from torch.amp import autocast, GradScaler

scaler = GradScaler("cuda")

for x, y in loader:
    opt.zero_grad(set_to_none=True)
    with autocast(device_type="cuda", dtype=torch.bfloat16):
        loss = loss_fn(model(x), y)
    scaler.scale(loss).backward()
    scaler.step(opt)
    scaler.update()

Custom Dataset

from torch.utils.data import Dataset

class CSVDataset(Dataset):
    def __init__(self, path, transform=None):
        import pandas as pd
        self.df = pd.read_csv(path)
        self.transform = transform
    def __len__(self): return len(self.df)
    def __getitem__(self, i):
        row = self.df.iloc[i]
        x = torch.tensor(row[:-1].values, dtype=torch.float32)
        y = torch.tensor(row[-1], dtype=torch.long)
        return (self.transform(x), y) if self.transform else (x, y)

Save / load

# state_dict (recommended)
torch.save(model.state_dict(), "model.pt")
model.load_state_dict(torch.load("model.pt", weights_only=True))

# safetensors (preferred for sharing, no pickle RCE)
from safetensors.torch import save_file, load_file
save_file(model.state_dict(), "model.safetensors")

Distributed (FSDP2, 2026 default for large)

import torch.distributed as dist
from torch.distributed.fsdp import FSDPModule, fully_shard

dist.init_process_group("nccl")
model = MLP(...).cuda()
fully_shard(model)  # FSDP2 API

torch.func (functional API)

from torch.func import vmap, grad

def loss(params, x, y):
    return ((model_fn(params, x) - y) ** 2).mean()

per_sample_grads = vmap(grad(loss), in_dims=(None, 0, 0))(params, X, Y)

매 결정 기준

상황 Approach
Single-GPU train model.cuda() + torch.compile
Multi-GPU same node DDP
Model > GPU mem FSDP2
Apple Silicon dev MPS backend
Inference, llm-scale vLLM / TensorRT-LLM
Quick prototype Lightning or pure loop

기본값: PyTorch 2.x + bf16 + torch.compile + AdamW.

🔗 Graph

🤖 LLM 활용

언제: 매 boilerplate training loop, 매 shape debug, 매 custom op skeleton. 언제 X: 매 hot-path numerical code 의 review 없이 trust X. 매 hallucinated API (e.g., 매 wrong autograd custom op).

안티패턴

  • zero_grad() 없이 backward: 매 grad accumulate 의 silent bug.
  • with torch.no_grad() forget at eval: 매 memory + 매 wrong stat.
  • CPU↔GPU 의 매 step transfer: 매 PCIe bottleneck. 매 pin_memory + non_blocking.
  • In-place op 의 autograd-tracked tensor: x += 1 의 backward 의 break.
  • weights_only=False (default 2.6+): pickle RCE risk. 매 always weights_only=True.
  • No set_to_none=True: 매 zero-fill 의 wasteful.

🧪 검증 / 중복

  • Verified (pytorch.org docs, PyTorch 2.x release notes).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — PyTorch 2.x foundations canonical.