2nd/10_Wiki/Topics/AI_and_ML/GPU Infrastructure.md

---
id: wiki-2026-0508-gpu-infrastructure
title: GPU Infrastructure
category: 10_Wiki/Topics
status: duplicate
canonical_id: wiki-2026-0508-gpu
duplicate_of: "[[GPU]]"
aliases: [GPU infra, GPU cluster, AI infra, NVLink, Infiniband]
source_trust_level: A
confidence_score: 0.92
verification_status: redirected
tags: [duplicate, gpu, infrastructure, ai-infra]
last_reinforced: 2026-05-10
github_commit: pending
---

# GPU Infrastructure

> **이 문서는 [[GPU]] 의 specialization 입니다.** Canonical 문서로 redirect.

## 핵심 요약 (infrastructure-specific)
- 매 multi-GPU node (NVLink, NVSwitch).
- 매 multi-node cluster (Infiniband, RoCE).
- 매 cloud (AWS p5/p4d, Azure ND, GCP A3).
- 매 colocation / on-prem (Lambda, CoreWeave).
- 매 distributed training (FSDP, ZeRO, TP, PP).
- 매 spot / preemptible cost optimization.

## 🔗 Graph
- 부모: [[GPU]] (canonical)
- Adjacent: [[Distributed-Systems]]

## 🕓 변경 이력
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | 중복 처리 — canonical 문서로 redirect |