Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

6.0 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

PyTorch Foundations

매 한 줄

"매 Tensor + autograd + nn.Module + DataLoader". 2016 Soumith Chintala @ Meta 가 release. 매 NumPy-like + GPU + automatic differentiation. 매 2026 현재 PyTorch 2.x — torch.compile, FSDP2, MPS backend, torch.func — 가 매 default DL framework.

매 핵심

매 4 pillars

Tensor: 매 N-d array, GPU/CPU/MPS, autograd-tracked.
Autograd: 매 reverse-mode AD — .backward().
nn.Module: 매 layer + state container.
DataLoader: 매 batched + parallel data pipeline.

매 device

CUDA: NVIDIA. 매 production default.
MPS: Apple Silicon. 매 dev-machine.
ROCm: AMD. 매 growing.
XPU: Intel.

매 응용

Vision (timm, torchvision).
NLP / LLM (transformers, vLLM 의 backend).
Diffusion (diffusers).
RL (cleanrl, torchrl).
Scientific ML (PINN, geometric DL).

💻 패턴

Tensor basics

import torch

x = torch.randn(3, 4, device="cuda", dtype=torch.float32)
y = torch.arange(12).reshape(3, 4).float().cuda()

z = x @ y.T          # matmul
w = x.mean(dim=0)    # reduction
print(x.shape, x.dtype, x.device)

Autograd

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x ** 2).sum()
y.backward()
print(x.grad)  # tensor([2., 4., 6.])

nn.Module

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, d_in, d_h, d_out):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_in, d_h), nn.GELU(),
            nn.Linear(d_h, d_h), nn.GELU(),
            nn.Linear(d_h, d_out),
        )
    def forward(self, x):
        return self.net(x)

model = MLP(784, 256, 10).cuda()

Training loop (canonical)

from torch.utils.data import DataLoader

opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()
loader = DataLoader(dataset, batch_size=128, shuffle=True,
                    num_workers=4, pin_memory=True)

for epoch in range(10):
    for x, y in loader:
        x, y = x.cuda(non_blocking=True), y.cuda(non_blocking=True)
        opt.zero_grad(set_to_none=True)
        logits = model(x)
        loss = loss_fn(logits, y)
        loss.backward()
        opt.step()

torch.compile (2.x default)

# 매 30-50% 속도 향상 의 free.
model = torch.compile(model, mode="reduce-overhead")
# mode: "default" | "reduce-overhead" | "max-autotune"

Mixed precision (bf16 / amp)

from torch.amp import autocast, GradScaler

scaler = GradScaler("cuda")

for x, y in loader:
    opt.zero_grad(set_to_none=True)
    with autocast(device_type="cuda", dtype=torch.bfloat16):
        loss = loss_fn(model(x), y)
    scaler.scale(loss).backward()
    scaler.step(opt)
    scaler.update()

Custom Dataset

from torch.utils.data import Dataset

class CSVDataset(Dataset):
    def __init__(self, path, transform=None):
        import pandas as pd
        self.df = pd.read_csv(path)
        self.transform = transform
    def __len__(self): return len(self.df)
    def __getitem__(self, i):
        row = self.df.iloc[i]
        x = torch.tensor(row[:-1].values, dtype=torch.float32)
        y = torch.tensor(row[-1], dtype=torch.long)
        return (self.transform(x), y) if self.transform else (x, y)

Save / load

# state_dict (recommended)
torch.save(model.state_dict(), "model.pt")
model.load_state_dict(torch.load("model.pt", weights_only=True))

# safetensors (preferred for sharing, no pickle RCE)
from safetensors.torch import save_file, load_file
save_file(model.state_dict(), "model.safetensors")

Distributed (FSDP2, 2026 default for large)

import torch.distributed as dist
from torch.distributed.fsdp import FSDPModule, fully_shard

dist.init_process_group("nccl")
model = MLP(...).cuda()
fully_shard(model)  # FSDP2 API

torch.func (functional API)

from torch.func import vmap, grad

def loss(params, x, y):
    return ((model_fn(params, x) - y) ** 2).mean()

per_sample_grads = vmap(grad(loss), in_dims=(None, 0, 0))(params, X, Y)

매 결정 기준

상황	Approach
Single-GPU train	`model.cuda()` + `torch.compile`
Multi-GPU same node	DDP
Model > GPU mem	FSDP2
Apple Silicon dev	MPS backend
Inference, llm-scale	vLLM / TensorRT-LLM
Quick prototype	Lightning or pure loop

기본값: PyTorch 2.x + bf16 + torch.compile + AdamW.

🔗 Graph

부모: Deep Learning
변형: JAX · TensorFlow
응용: Transformer_Architecture_and_LLM_Foundations · Diffusion-Models · Reinforcement-Learning
Adjacent: Lightning · Triton

🤖 LLM 활용

언제: 매 boilerplate training loop, 매 shape debug, 매 custom op skeleton. 언제 X: 매 hot-path numerical code 의 review 없이 trust X. 매 hallucinated API (e.g., 매 wrong autograd custom op).

❌ 안티패턴

zero_grad() 없이 backward: 매 grad accumulate 의 silent bug.
with torch.no_grad() forget at eval: 매 memory + 매 wrong stat.
CPU↔GPU 의 매 step transfer: 매 PCIe bottleneck. 매 pin_memory + non_blocking.
In-place op 의 autograd-tracked tensor: x += 1 의 backward 의 break.
weights_only=False (default 2.6+): pickle RCE risk. 매 always weights_only=True.
No set_to_none=True: 매 zero-fill 의 wasteful.

🧪 검증 / 중복

Verified (pytorch.org docs, PyTorch 2.x release notes).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — PyTorch 2.x foundations canonical.

6.0 KiB Raw Blame History