Files
2nd/10_Wiki/Topics/AI_and_ML/Spectral-Clustering.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

194 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-spectral-clustering
title: Spectral Clustering
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Graph Spectral Clustering, Laplacian Clustering, Normalized Cuts]
duplicate_of: none
source_trust_level: A
confidence_score: 0.93
verification_status: applied
tags: [clustering, graph, unsupervised, laplacian, eigendecomposition]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: scikit-learn/scipy/networkx
---
# Spectral Clustering
## 매 한 줄
> **"매 graph Laplacian 의 eigenvector 의 lower-dim embed → k-means"**. Spectral clustering 매 affinity-graph 매 cluster 의 detect, 매 non-convex / manifold 의 흐름 의 break (concentric circle, moons). 매 von Luxburg 2007 tutorial 의 canonical reference; 매 modern 매 Nyström approx + GPU eigen 의 large-scale.
## 매 핵심
### 매 3-step recipe
1. **Affinity matrix** $W$: $w_{ij} = \exp(-\|x_i - x_j\|^2 / 2\sigma^2)$ 또는 k-NN graph.
2. **Laplacian**:
- Unnormalized: $L = D - W$
- Symmetric normalized (Ng-Jordan-Weiss): $L_{sym} = I - D^{-1/2} W D^{-1/2}$
- Random-walk: $L_{rw} = I - D^{-1} W$
3. **Eigendecompose** → take k smallest eigenvectors → row-normalize → k-means on rows.
### 매 why eigenvectors?
- 매 graph cut (RatioCut / NCut) 매 NP-hard.
- 매 spectral relaxation 매 continuous: 매 2nd-smallest eigenvector (Fiedler) 의 sign 매 binary cut 의 approximate.
- 매 k cluster 매 k smallest eigenvectors 의 use.
### 매 variant
- **Ng-Jordan-Weiss (2002)**: $L_{sym}$ + row-normalize.
- **Shi-Malik (2000)**: Normalized Cuts, $L_{rw}$, image segmentation.
- **Self-tuning** (Zelnik-Manor 2004): per-point sigma.
- **Power Iteration Clustering** (Lin-Cohen 2010): 매 cheap approx.
### 매 응용
1. Image segmentation (NCut on pixel graph).
2. Community detection (small social nets).
3. Manifold-aware clustering (Swiss-roll, moons).
4. Speaker diarization (utterance affinity).
5. Document clustering (TF-IDF cosine graph).
## 💻 패턴
### scikit-learn
```python
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons
X, _ = make_moons(n_samples=400, noise=0.05)
sc = SpectralClustering(
n_clusters=2,
affinity="nearest_neighbors", # k-NN graph
n_neighbors=10,
assign_labels="kmeans",
random_state=42,
)
labels = sc.fit_predict(X)
```
### From scratch (numpy + scipy)
```python
import numpy as np
from scipy.sparse import csgraph
from scipy.sparse.linalg import eigsh
from sklearn.cluster import KMeans
from sklearn.neighbors import kneighbors_graph
def spectral_cluster(X, k, n_neighbors=10):
# 1. k-NN affinity
W = kneighbors_graph(X, n_neighbors=n_neighbors, mode='connectivity')
W = 0.5 * (W + W.T) # symmetrize
# 2. Symmetric normalized Laplacian
L = csgraph.laplacian(W, normed=True)
# 3. k smallest eigenvectors
vals, vecs = eigsh(L, k=k, which='SM')
# 4. Row-normalize
norm = np.linalg.norm(vecs, axis=1, keepdims=True)
vecs = vecs / np.clip(norm, 1e-10, None)
# 5. k-means
return KMeans(n_clusters=k, n_init=10).fit_predict(vecs)
```
### RBF affinity
```python
from sklearn.metrics.pairwise import rbf_kernel
def rbf_affinity(X, sigma=1.0):
gamma = 1.0 / (2.0 * sigma**2)
return rbf_kernel(X, gamma=gamma)
```
### Sigma auto-tuning (k-th NN distance)
```python
from sklearn.neighbors import NearestNeighbors
def auto_sigma(X, k=7):
nn = NearestNeighbors(n_neighbors=k+1).fit(X)
d, _ = nn.kneighbors(X)
return np.median(d[:, k])
```
### Eigengap heuristic (choose k)
```python
def eigengap_k(L, max_k=15):
vals, _ = eigsh(L, k=max_k, which='SM')
vals = np.sort(vals)
gaps = np.diff(vals)
return int(np.argmax(gaps)) + 1
```
### Large-scale Nyström approximation
```python
from sklearn.kernel_approximation import Nystroem
from sklearn.cluster import KMeans
# For N >> 10k
nys = Nystroem(kernel='rbf', gamma=0.1, n_components=300, random_state=0)
X_low = nys.fit_transform(X)
labels = KMeans(n_clusters=k, n_init=10).fit_predict(X_low)
```
### Image segmentation (NCut)
```python
from skimage import data, segmentation, color
from skimage.future import graph
img = data.coffee()
labels1 = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels1, mode='similarity')
labels2 = graph.cut_normalized(labels1, g)
out = color.label2rgb(labels2, img, kind='avg')
```
### Diarization affinity (cosine)
```python
def speaker_affinity(embeddings):
# (N, D) speaker embeddings, L2-normalized
sim = embeddings @ embeddings.T
sim = (sim + 1) / 2 # [0,1]
return sim
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Convex blob clusters | k-means (faster) |
| Non-convex / manifold | Spectral (k-NN affinity) |
| N < 5k | Full eigendecomp |
| 5k < N < 50k | k-NN sparse + eigsh |
| N > 50k | Nyström / mini-batch |
| Image seg | NCut + SLIC superpixels |
| Speaker diar | Cosine affinity + spectral |
**기본값**: sklearn `SpectralClustering(affinity='nearest_neighbors', n_neighbors=10)`.
## 🔗 Graph
- 부모: [[Clustering]]
- 변형: [[K-Means]]
- 응용: [[Image-Segmentation]]
- Adjacent: [[Normalized-Cuts]]
## 🤖 LLM 활용
**언제**: 매 affinity choice rationale, 매 eigengap interpretation, 매 sklearn pipeline scaffolding.
**언제 X**: 매 numerical eigendecomp (use scipy/PyTorch), 매 cluster validation 매 ground-truth needed.
## ❌ 안티패턴
- **Dense N×N for N>10k**: 매 OOM. 매 k-NN sparse 의 use.
- **Sigma 의 untuned**: 매 RBF kernel 매 useless. 매 median distance heuristic.
- **k 매 hand-pick**: 매 eigengap heuristic 의 first try.
- **No symmetrization**: 매 k-NN graph 의 directed → 매 complex eigenvalues.
- **Wrong Laplacian for unbalanced**: 매 unnormalized 매 cluster size 의 sensitive. 매 $L_{sym}$ default.
## 🧪 검증 / 중복
- Verified (von Luxburg "A Tutorial on Spectral Clustering" 2007; Ng-Jordan-Weiss NIPS 2002; sklearn docs 1.5).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (Laplacian variants + sklearn/scipy + Nyström patterns) |