Files
2nd/10_Wiki/Topics/AI_and_ML/Selective State Space Models (Mamba).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

4.5 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-selective-state-space-models-mam Selective State Space Models (Mamba) 10_Wiki/Topics verified self
Mamba
S6
Selective SSM
State Space Model
none A 0.9 applied
architecture
ssm
sequence-modeling
llm
2026-05-10 pending
language framework
python PyTorch / mamba-ssm

Selective State Space Models (Mamba)

매 한 줄

"매 hidden state 가 input 에 따라 selectively update". 매 Gu & Dao (2023) 의 Mamba — S4 의 시간-불변 한계를 깬 selective scan (S6). 매 linear-time sequence modeling, Transformer 와 경쟁 가능한 long-context 효율. 매 2026: Mamba-2, hybrid Transformer-Mamba (Jamba, Zamba2) 가 prod 진입.

매 핵심

매 SSM 기초

  • Continuous: x'(t) = Ax(t) + Bu(t), y(t) = Cx(t).
  • Discretized (zero-order hold): xₖ = Āxₖ₋₁ + B̄uₖ.
  • S4: A는 HiPPO-init, time-invariant → 매 efficient FFT convolution.

매 Selective (S6)

  • B, C, Δ를 input-dependent function. 매 매 token마다 dynamic.
  • FFT 못 씀 → 매 hardware-aware parallel scan (kernel fusion, SRAM).
  • Benefit: 매 selective recall, copying, induction 가능 (S4 못함).

매 vs Transformer

  • Compute: O(L) vs O(L²). 매 long context 큰 advantage.
  • Memory: constant state vs KV cache. 매 inference 매우 cheap.
  • Quality: 7B scale 비슷, 14B+ Transformer slight edge — 매 hybrid 가 sweet spot.

매 응용

  1. Long-context LLM (Codestral Mamba, Jamba 1.5, Zamba2).
  2. Genomic sequence (HyenaDNA → Caduceus → Evo).
  3. Audio / time series.
  4. State tracking, retrieval (induction heads).

💻 패턴

Mamba block 사용 (mamba-ssm)

from mamba_ssm import Mamba
import torch

block = Mamba(d_model=1024, d_state=16, d_conv=4, expand=2).cuda()
x = torch.randn(2, 4096, 1024).cuda()
y = block(x)  # (2, 4096, 1024), O(L)

Selective scan (toy)

def selective_scan(u, delta, A, B, C):
    # u:(B,L,D), delta:(B,L,D), A:(D,N), B,C:(B,L,N)
    dA = torch.exp(delta.unsqueeze(-1) * A)        # discretize
    dB = delta.unsqueeze(-1) * B.unsqueeze(2)
    x = torch.zeros(u.shape[0], u.shape[2], A.shape[1], device=u.device)
    ys = []
    for t in range(u.shape[1]):
        x = dA[:, t] * x + dB[:, t] * u[:, t].unsqueeze(-1)
        ys.append((x * C[:, t].unsqueeze(1)).sum(-1))
    return torch.stack(ys, dim=1)

Mamba-2 block (SSD)

from mamba_ssm import Mamba2
b = Mamba2(d_model=2048, d_state=128, d_conv=4, expand=2, headdim=64).cuda()

Hybrid stack (Jamba-style)

class HybridLayer(nn.Module):
    def __init__(self, d, attn_every=4, idx=0):
        super().__init__()
        self.use_attn = (idx % attn_every) == 0
        self.mix = nn.MultiheadAttention(d, 8, batch_first=True) if self.use_attn else Mamba(d_model=d)
        self.ffn = SwiGLU(d)
    def forward(self, x):
        h = self.mix(x, x, x)[0] if self.use_attn else self.mix(x)
        return self.ffn(x + h)

1M context inference

# Mamba: KV cache 없음 → constant memory
model.eval()
with torch.no_grad():
    state = None
    for chunk in chunks_of_1M_tokens:
        out, state = model.step(chunk, state)

매 결정 기준

상황 Approach
Long context (>32k) inference cost critical Mamba / Jamba
Need strong in-context reasoning Transformer or Hybrid
Genomic / audio million-length Mamba family
Standard chat 8k context Transformer (matured tooling)
Edge device, low memory Mamba (no KV cache)

기본값: Hybrid (Jamba/Zamba2) — 매 best of both.

🔗 Graph

🤖 LLM 활용

언제: 매우 긴 context, streaming, 매 inference 비용 critical. Genomic / audio. 언제 X: 매 strong needle-in-haystack recall — pure Mamba 약함, hybrid 필요.

안티패턴

  • Pure Mamba for retrieval: induction OK 지만 exact recall 매 약함.
  • Naive scan implementation: SRAM-aware kernel 없으면 매 felt slower than attention.
  • S4 (non-selective) for LLM: 매 obsoleted by S6/Mamba.

🧪 검증 / 중복

  • Verified (Gu & Dao 2023 "Mamba", Mamba-2 2024, Jamba 2024).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — Mamba/Mamba-2/hybrid 2026 state