Files
2nd/10_Wiki/Topics/AI_and_ML/KV Cache Compression.md
T
2026-05-10 22:08:15 +09:00

1.1 KiB

id, title, category, status, canonical_id, duplicate_of, aliases, source_trust_level, confidence_score, verification_status, tags, last_reinforced, github_commit
id title category status canonical_id duplicate_of aliases source_trust_level confidence_score verification_status tags last_reinforced github_commit
wiki-2026-0508-kv-cache-compression KV Cache Compression 10_Wiki/Topics duplicate key-value-kv-cache Key-Value (KV) Cache
A 0.9 redirected
duplicate
kv-cache
llm-inference
compression
2026-05-10 pending

KV Cache Compression

이 문서는 Key-Value (KV) Cache 의 중복본입니다. Canonical 문서로 redirect.

핵심 요약 (specialized aspects)

  • KV Cache compression 기법 (quantization INT8/INT4, eviction H2O/StreamingLLM, sharing GQA/MQA, low-rank)은 canonical Key-Value (KV) Cache 문서의 "Compression" 섹션에 통합되어 있음.
  • Compression-specific 결정 표 (memory vs accuracy trade-off)는 canonical 참조.
  • 2026 기준 PagedAttention, vLLM의 KV reuse는 canonical에서 다룸.

🔗 Graph

🕓 변경 이력

날짜 변경
2026-05-08 Phase 1
2026-05-10 중복 처리 — canonical 문서로 redirect