[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,89 +1,172 @@
 ---
 id: wiki-2026-0508-singular-value-decomposition
-title: Singular Value Decomposition
+title: Singular Value Decomposition (SVD)
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [MATH-LA-SVD-001]
+aliases: [SVD, Matrix Factorization]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [math, Linear-Algebra, svd, Dimensionality-Reduction, pca, Recommendation-Systems, Matrix-Factorization]
+confidence_score: 0.95
+verification_status: applied
+tags: [linear-algebra, matrix-factorization, dimensionality-reduction, machine-learning]
 raw_sources: []
-last_reinforced: 2026-04-26
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: Python
+  framework: NumPy/SciPy
 ---

-# Singular Value Decomposition (SVD, 특이값 분해)
+# Singular Value Decomposition (SVD)

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "복잡한 데이터 행렬을 본질적인 에너지(특이값)의 순서대로 해체하고, 사소한 노이즈를 걷어내어 데이터가 숨기고 있던 '핵심 구조'만을 선명하게 드러내라" — 임의의 행렬을 세 개의 특수한 행렬($U, \Sigma, V^T$)의 곱으로 분해하여 데이터의 특징 추출 및 차원 축소에 활용하는 강력한 선형대수학 기법.
+## 매 한 줄
+> **"매 matrix 의 universal factorization"**. Beltrami (1873) / Jordan (1874) 에서 origin — 매 modern ML/DL 의 foundational tool: PCA, recommendation, LLM weight compression (LoRA, 2026 vLLM/MLX 의 SVD-based pruning).

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** "Energy-based Rank Reduction and Latent Structure Discovery" — 행렬의 정보를 가장 잘 설명하는 방향(특이 벡터)과 그 중요도(특이값)를 산출하고, 작은 특이값들을 0으로 처리함으로써 데이터의 용량은 획기적으로 줄이면서 본질적인 정보는 보존하는 패턴.
- **수학적 구성:** $A = U\Sigma V^T$
-    - **$U$:** 왼쪽 특이 벡터 (행 사이의 관계/공간).
-    - **$\Sigma$:** 특이값 (각 성분의 중요도/에너지 크기). 내림차순 정렬됨.
-    - **$V^T$:** 오른쪽 특이 벡터 (열 사이의 관계/특징).
- **의의:** 추천 시스템(사용자-아이템 취향 분석), 이미지 압축, 자연어 처리의 잠재 의미 분석(LSA) 등 데이터의 '숨은 의미'를 찾아야 하는 모든 곳의 수학적 표준.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** 연산 복잡도가 높다는 단점으로 대규모 데이터셋 적용에 한계가 있었으나, 최근에는 Truncated SVD나 Randomized SVD 등 필요한 성분만 빠르게 뽑아내는 근사 기법들이 발전하여 빅데이터 환경에서도 핵심 도구로 쓰임.
- **정책 변화:** Antigravity 프로젝트는 대규모 지식 관계망의 차원 축소 및 문서 간의 잠재적 유사성 탐색 시, 정보 손실을 최소화하면서 연산 효율을 높이는 SVD 기반의 알고리즘을 내부 라이브러리로 운용함.
+### 매 Decomposition
+- `A = U Σ Vᵀ`
+- `A` (m×n), `U` (m×m, orthogonal), `Σ` (m×n, diagonal, σ₁≥σ₂≥...≥0), `Vᵀ` (n×n, orthogonal).
+- σᵢ = singular values (≥ 0). U columns = left singular vectors. V columns = right.

-## 🔗 지식 연결 (Graph)
- Principal-Component-[[Analysis|Analysis]]-PCA, Dimensionality-Reduction-Strategies, [[Recommendation-Systems|Recommendation-Systems]], [[Scientific-Computing-with-Python|Scientific-Computing-with-Python]]
- **Raw Source:** 10_Wiki/Topics/AI/Singular-Value-Decomposition.md
+### 매 핵심 properties
+- Always exists (any matrix, even non-square / singular).
+- σᵢ² = eigenvalues of `AᵀA` (and `AAᵀ`).
+- rank(A) = number of non-zero σᵢ.
+- ||A||₂ = σ₁ (largest singular value).
+- ||A||_F = √Σσᵢ² (Frobenius norm).

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 응용
+1. PCA — top-k SVD of centered X.
+2. Pseudoinverse `A⁺ = V Σ⁺ Uᵀ`.
+3. Low-rank approximation (Eckart-Young theorem).
+4. Recommender systems (Netflix, Funk SVD).
+5. LoRA / weight compression (2026 LLM fine-tuning).
+6. Image compression.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+## 💻 패턴

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### NumPy SVD
+```python
+import numpy as np

-## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
-
-## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+A = np.random.randn(100, 50)
+U, s, Vt = np.linalg.svd(A, full_matrices=False)
+# U: (100,50), s: (50,), Vt: (50,50)
+# Reconstruct: A ≈ U @ np.diag(s) @ Vt
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### Truncated SVD (low-rank approx)
+```python
+from sklearn.decomposition import TruncatedSVD

-**선택 A를 써야 할 때:**
- *(TODO)*
+# Top-k components — Eckart-Young optimal rank-k approx
+svd = TruncatedSVD(n_components=10)
+X_reduced = svd.fit_transform(X)  # (n_samples, 10)
+print(svd.explained_variance_ratio_.sum())
+```

-**선택 B를 써야 할 때:**
- *(TODO)*
+### PCA via SVD
+```python
+def pca_svd(X, k):
+    X_centered = X - X.mean(axis=0)
+    U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
+    # Principal components = rows of Vt
+    return X_centered @ Vt[:k].T  # project to k-dim
+```

-**기본값:**
-> *(TODO)*
+### Pseudoinverse
+```python
+def pinv_svd(A, rcond=1e-10):
+    U, s, Vt = np.linalg.svd(A, full_matrices=False)
+    s_inv = np.where(s > rcond * s.max(), 1/s, 0)
+    return Vt.T @ np.diag(s_inv) @ U.T

-## ❌ 안티패턴 (Anti-Patterns)
+# Solve least squares: x = A⁺ b
+x = pinv_svd(A) @ b
+```

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+### Image compression
+```python
+from PIL import Image
+import numpy as np
+
+img = np.array(Image.open("photo.jpg").convert("L"))
+U, s, Vt = np.linalg.svd(img, full_matrices=False)
+# Keep top-k singular values
+k = 50
+compressed = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
+# Storage: m*k + k + k*n vs m*n
+```
+
+### Randomized SVD (large matrices)
+```python
+from sklearn.utils.extmath import randomized_svd
+
+# Halko et al. 2011 — O(mn log k) instead of O(mn²)
+U, s, Vt = randomized_svd(X_huge, n_components=20, random_state=42)
+```
+
+### LoRA-style low-rank weight update (2026)
+```python
+import torch
+
+# Original frozen weight W (d_out, d_in)
+# Learn ΔW = B @ A where B (d_out, r), A (r, d_in), r << min(d_out, d_in)
+class LoRALayer(torch.nn.Module):
+    def __init__(self, d_in, d_out, rank=8):
+        super().__init__()
+        self.A = torch.nn.Parameter(torch.randn(rank, d_in) * 0.01)
+        self.B = torch.nn.Parameter(torch.zeros(d_out, rank))
+
+    def forward(self, x, W_frozen):
+        return x @ W_frozen.T + x @ self.A.T @ self.B.T
+```
+
+### Eckart-Young error bound
+```python
+# Best rank-k approx error in Frobenius norm
+U, s, Vt = np.linalg.svd(A, full_matrices=False)
+k = 5
+A_k = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
+error_frob = np.sqrt(np.sum(s[k:]**2))
+assert np.isclose(np.linalg.norm(A - A_k, 'fro'), error_frob)
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Dense small matrix | `np.linalg.svd` |
+| Top-k only, large | `randomized_svd` / `TruncatedSVD` |
+| Sparse matrix | `scipy.sparse.linalg.svds` |
+| LLM weight adapter | LoRA (low-rank ΔW) |
+| Recommender (sparse ratings) | Funk SVD / ALS |
+
+**기본값**: full SVD via NumPy for small dense; randomized for large; sparse SVD for sparse.
+
+## 🔗 Graph
+- 부모: [[Linear-Algebra]] · [[Matrix-Factorization]]
+- 변형: [[Truncated-SVD]] · [[Randomized-SVD]] · [[QR-Decomposition]] · [[Eigendecomposition]]
+- 응용: [[Principal-Component-Analysis]] · [[LoRA]] · [[Recommender-Systems]] · [[Latent-Semantic-Analysis]]
+- Adjacent: [[Ridge-Regression]] · [[Pseudoinverse]] · [[Eckart-Young-Theorem]]
+
+## 🤖 LLM 활용
+**언제**: Dimensionality reduction (PCA). Pseudoinverse 의 compute. Low-rank approximation (compression, denoising). LLM weight 의 LoRA / SVD pruning. Spectral analysis.
+**언제 X**: Very large sparse matrix (use iterative methods like Lanczos). Streaming data (use online PCA / incremental SVD).
+
+## ❌ 안티패턴
+- **`full_matrices=True` for fat/thin**: 매 wasted memory — `full_matrices=False` 의 사용.
+- **Eigendecomposition of `AᵀA`**: 매 numerically unstable (squares condition number) — direct SVD 의 use.
+- **Forgetting to center for PCA**: SVD on uncentered X = X dominated by mean direction.
+- **Naive SVD on huge sparse**: O(mn²) — 매 randomized / Lanczos 의 use.
+
+## 🧪 검증 / 중복
+- Verified (Trefethen & Bau "Numerical Linear Algebra", Strang "Linear Algebra and Learning from Data", LAPACK gesdd).
+- 신뢰도 A+.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — SVD with PCA, pseudoinverse, randomized, LoRA applications |