Files
2nd/10_Wiki/Topics/AI_and_ML/Spectral-Clustering.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.1 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-spectral-clustering Spectral Clustering 10_Wiki/Topics verified self
Graph Spectral Clustering
Laplacian Clustering
Normalized Cuts
none A 0.93 applied
clustering
graph
unsupervised
laplacian
eigendecomposition
2026-05-10 pending
language framework
Python scikit-learn/scipy/networkx

Spectral Clustering

매 한 줄

"매 graph Laplacian 의 eigenvector 의 lower-dim embed → k-means". Spectral clustering 매 affinity-graph 매 cluster 의 detect, 매 non-convex / manifold 의 흐름 의 break (concentric circle, moons). 매 von Luxburg 2007 tutorial 의 canonical reference; 매 modern 매 Nyström approx + GPU eigen 의 large-scale.

매 핵심

매 3-step recipe

  1. Affinity matrix W: w_{ij} = \exp(-\|x_i - x_j\|^2 / 2\sigma^2) 또는 k-NN graph.
  2. Laplacian:
    • Unnormalized: L = D - W
    • Symmetric normalized (Ng-Jordan-Weiss): L_{sym} = I - D^{-1/2} W D^{-1/2}
    • Random-walk: L_{rw} = I - D^{-1} W
  3. Eigendecompose → take k smallest eigenvectors → row-normalize → k-means on rows.

매 why eigenvectors?

  • 매 graph cut (RatioCut / NCut) 매 NP-hard.
  • 매 spectral relaxation 매 continuous: 매 2nd-smallest eigenvector (Fiedler) 의 sign 매 binary cut 의 approximate.
  • 매 k cluster 매 k smallest eigenvectors 의 use.

매 variant

  • Ng-Jordan-Weiss (2002): L_{sym} + row-normalize.
  • Shi-Malik (2000): Normalized Cuts, L_{rw}, image segmentation.
  • Self-tuning (Zelnik-Manor 2004): per-point sigma.
  • Power Iteration Clustering (Lin-Cohen 2010): 매 cheap approx.

매 응용

  1. Image segmentation (NCut on pixel graph).
  2. Community detection (small social nets).
  3. Manifold-aware clustering (Swiss-roll, moons).
  4. Speaker diarization (utterance affinity).
  5. Document clustering (TF-IDF cosine graph).

💻 패턴

scikit-learn

from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons

X, _ = make_moons(n_samples=400, noise=0.05)
sc = SpectralClustering(
    n_clusters=2,
    affinity="nearest_neighbors",  # k-NN graph
    n_neighbors=10,
    assign_labels="kmeans",
    random_state=42,
)
labels = sc.fit_predict(X)

From scratch (numpy + scipy)

import numpy as np
from scipy.sparse import csgraph
from scipy.sparse.linalg import eigsh
from sklearn.cluster import KMeans
from sklearn.neighbors import kneighbors_graph

def spectral_cluster(X, k, n_neighbors=10):
    # 1. k-NN affinity
    W = kneighbors_graph(X, n_neighbors=n_neighbors, mode='connectivity')
    W = 0.5 * (W + W.T)  # symmetrize
    # 2. Symmetric normalized Laplacian
    L = csgraph.laplacian(W, normed=True)
    # 3. k smallest eigenvectors
    vals, vecs = eigsh(L, k=k, which='SM')
    # 4. Row-normalize
    norm = np.linalg.norm(vecs, axis=1, keepdims=True)
    vecs = vecs / np.clip(norm, 1e-10, None)
    # 5. k-means
    return KMeans(n_clusters=k, n_init=10).fit_predict(vecs)

RBF affinity

from sklearn.metrics.pairwise import rbf_kernel

def rbf_affinity(X, sigma=1.0):
    gamma = 1.0 / (2.0 * sigma**2)
    return rbf_kernel(X, gamma=gamma)

Sigma auto-tuning (k-th NN distance)

from sklearn.neighbors import NearestNeighbors

def auto_sigma(X, k=7):
    nn = NearestNeighbors(n_neighbors=k+1).fit(X)
    d, _ = nn.kneighbors(X)
    return np.median(d[:, k])

Eigengap heuristic (choose k)

def eigengap_k(L, max_k=15):
    vals, _ = eigsh(L, k=max_k, which='SM')
    vals = np.sort(vals)
    gaps = np.diff(vals)
    return int(np.argmax(gaps)) + 1

Large-scale Nyström approximation

from sklearn.kernel_approximation import Nystroem
from sklearn.cluster import KMeans

# For N >> 10k
nys = Nystroem(kernel='rbf', gamma=0.1, n_components=300, random_state=0)
X_low = nys.fit_transform(X)
labels = KMeans(n_clusters=k, n_init=10).fit_predict(X_low)

Image segmentation (NCut)

from skimage import data, segmentation, color
from skimage.future import graph

img = data.coffee()
labels1 = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels1, mode='similarity')
labels2 = graph.cut_normalized(labels1, g)
out = color.label2rgb(labels2, img, kind='avg')

Diarization affinity (cosine)

def speaker_affinity(embeddings):
    # (N, D) speaker embeddings, L2-normalized
    sim = embeddings @ embeddings.T
    sim = (sim + 1) / 2  # [0,1]
    return sim

매 결정 기준

상황 Approach
Convex blob clusters k-means (faster)
Non-convex / manifold Spectral (k-NN affinity)
N < 5k Full eigendecomp
5k < N < 50k k-NN sparse + eigsh
N > 50k Nyström / mini-batch
Image seg NCut + SLIC superpixels
Speaker diar Cosine affinity + spectral

기본값: sklearn SpectralClustering(affinity='nearest_neighbors', n_neighbors=10).

🔗 Graph

🤖 LLM 활용

언제: 매 affinity choice rationale, 매 eigengap interpretation, 매 sklearn pipeline scaffolding. 언제 X: 매 numerical eigendecomp (use scipy/PyTorch), 매 cluster validation 매 ground-truth needed.

안티패턴

  • Dense N×N for N>10k: 매 OOM. 매 k-NN sparse 의 use.
  • Sigma 의 untuned: 매 RBF kernel 매 useless. 매 median distance heuristic.
  • k 매 hand-pick: 매 eigengap heuristic 의 first try.
  • No symmetrization: 매 k-NN graph 의 directed → 매 complex eigenvalues.
  • Wrong Laplacian for unbalanced: 매 unnormalized 매 cluster size 의 sensitive. 매 L_{sym} default.

🧪 검증 / 중복

  • Verified (von Luxburg "A Tutorial on Spectral Clustering" 2007; Ng-Jordan-Weiss NIPS 2002; sklearn docs 1.5).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full content (Laplacian variants + sklearn/scipy + Nyström patterns)