--- id: wiki-2026-0508-selective-state-space-models-mam title: Selective State Space Models (Mamba) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Mamba, S6, Selective SSM, State Space Model] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [architecture, ssm, sequence-modeling, llm] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: PyTorch / mamba-ssm --- # Selective State Space Models (Mamba) ## 매 한 줄 > **"매 hidden state 가 input 에 따라 selectively update"**. 매 Gu & Dao (2023) 의 Mamba — S4 의 시간-불변 한계를 깬 selective scan (S6). 매 linear-time sequence modeling, Transformer 와 경쟁 가능한 long-context 효율. 매 2026: Mamba-2, hybrid Transformer-Mamba (Jamba, Zamba2) 가 prod 진입. ## 매 핵심 ### 매 SSM 기초 - Continuous: x'(t) = Ax(t) + Bu(t), y(t) = Cx(t). - Discretized (zero-order hold): xₖ = Āxₖ₋₁ + B̄uₖ. - S4: A는 HiPPO-init, time-invariant → 매 efficient FFT convolution. ### 매 Selective (S6) - B, C, Δ를 input-dependent function. 매 매 token마다 dynamic. - FFT 못 씀 → 매 hardware-aware parallel scan (kernel fusion, SRAM). - Benefit: 매 selective recall, copying, induction 가능 (S4 못함). ### 매 vs Transformer - Compute: O(L) vs O(L²). 매 long context 큰 advantage. - Memory: constant state vs KV cache. 매 inference 매우 cheap. - Quality: 7B scale 비슷, 14B+ Transformer slight edge — 매 hybrid 가 sweet spot. ### 매 응용 1. Long-context LLM (Codestral Mamba, Jamba 1.5, Zamba2). 2. Genomic sequence (HyenaDNA → Caduceus → Evo). 3. Audio / time series. 4. State tracking, retrieval (induction heads). ## 💻 패턴 ### Mamba block 사용 (mamba-ssm) ```python from mamba_ssm import Mamba import torch block = Mamba(d_model=1024, d_state=16, d_conv=4, expand=2).cuda() x = torch.randn(2, 4096, 1024).cuda() y = block(x) # (2, 4096, 1024), O(L) ``` ### Selective scan (toy) ```python def selective_scan(u, delta, A, B, C): # u:(B,L,D), delta:(B,L,D), A:(D,N), B,C:(B,L,N) dA = torch.exp(delta.unsqueeze(-1) * A) # discretize dB = delta.unsqueeze(-1) * B.unsqueeze(2) x = torch.zeros(u.shape[0], u.shape[2], A.shape[1], device=u.device) ys = [] for t in range(u.shape[1]): x = dA[:, t] * x + dB[:, t] * u[:, t].unsqueeze(-1) ys.append((x * C[:, t].unsqueeze(1)).sum(-1)) return torch.stack(ys, dim=1) ``` ### Mamba-2 block (SSD) ```python from mamba_ssm import Mamba2 b = Mamba2(d_model=2048, d_state=128, d_conv=4, expand=2, headdim=64).cuda() ``` ### Hybrid stack (Jamba-style) ```python class HybridLayer(nn.Module): def __init__(self, d, attn_every=4, idx=0): super().__init__() self.use_attn = (idx % attn_every) == 0 self.mix = nn.MultiheadAttention(d, 8, batch_first=True) if self.use_attn else Mamba(d_model=d) self.ffn = SwiGLU(d) def forward(self, x): h = self.mix(x, x, x)[0] if self.use_attn else self.mix(x) return self.ffn(x + h) ``` ### 1M context inference ```python # Mamba: KV cache 없음 → constant memory model.eval() with torch.no_grad(): state = None for chunk in chunks_of_1M_tokens: out, state = model.step(chunk, state) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Long context (>32k) inference cost critical | Mamba / Jamba | | Need strong in-context reasoning | Transformer or Hybrid | | Genomic / audio million-length | Mamba family | | Standard chat 8k context | Transformer (matured tooling) | | Edge device, low memory | Mamba (no KV cache) | **기본값**: Hybrid (Jamba/Zamba2) — 매 best of both. ## 🔗 Graph - 부모: [[State-Space-Models]] · [[Sequence-to-Sequence-Models]] - 변형: [[Mamba-2]] · [[S4]] · [[Hyena]] · [[RWKV]] - 응용: [[Long-Context-LLM]] · [[Genomic-Models]] - Adjacent: [[Transformer]] · [[Linear-Attention]] ## 🤖 LLM 활용 **언제**: 매우 긴 context, streaming, 매 inference 비용 critical. Genomic / audio. **언제 X**: 매 strong needle-in-haystack recall — pure Mamba 약함, hybrid 필요. ## ❌ 안티패턴 - **Pure Mamba for retrieval**: induction OK 지만 exact recall 매 약함. - **Naive scan implementation**: SRAM-aware kernel 없으면 매 felt slower than attention. - **S4 (non-selective)** for LLM: 매 obsoleted by S6/Mamba. ## 🧪 검증 / 중복 - Verified (Gu & Dao 2023 "Mamba", Mamba-2 2024, Jamba 2024). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Mamba/Mamba-2/hybrid 2026 state |