--- id: [[P-Reinforce|P-Reinforce]]-AUTO-PATT-001 category: Unified confidence_score: 1.00 tags: [auto-reinforced, paged-attention, vllm, kv-cache, memory-management, fragmentation] last_reinforced: 2026-05-04 --- # [[PagedAttention|PagedAttention]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "OS์˜ ์ง€ํ˜œ๋ฅผ AI๋กœ: ์šด์˜์ฒด์ œ์˜ ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ํŽ˜์ด์ง• ๊ธฐ๋ฒ•์„ KV ์บ์‹œ ๊ด€๋ฆฌ์— ๋„์ž…ํ•˜์—ฌ, ๋ฉ”๋ชจ๋ฆฌ ๋‹จํŽธํ™”๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํ™œ์šฉ๋ฅ ์„ 96% ์ด์ƒ์œผ๋กœ ๋Œ์–ด์˜ฌ๋ฆฐ ์ถ”๋ก  ์—”์ง„์˜ ํ˜๋ช…." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) PagedAttention์€ LLM ์ถ”๋ก  ์‹œ KV ์บ์‹œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๊ธฐ์ˆ ๋กœ, ์—ฐ์†์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ๋Œ€์‹  ๋น„์—ฐ์†์ ์ธ ๋ธ”๋ก(Block) ๋‹จ์œ„ ํ• ๋‹น ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 1. **ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜**: * **๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ํŽ˜์ด์ง•**: KV ์บ์‹œ๋ฅผ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ '๋…ผ๋ฆฌ์  ๋ธ”๋ก'์œผ๋กœ ๋‚˜๋ˆ„๊ณ , ์ด๋ฅผ ์‹ค์ œ '๋ฌผ๋ฆฌ์  ๋ธ”๋ก'์— ๋™์ ์œผ๋กœ ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค. * **๋ธ”๋ก ํ…Œ์ด๋ธ” (Block Table)**: ๋…ผ๋ฆฌ์  ๋ธ”๋ก๊ณผ ๋ฌผ๋ฆฌ์  ๋ธ”๋ก ์‚ฌ์ด์˜ ๋งคํ•‘ ์ •๋ณด๋ฅผ ์ €์žฅํ•˜์—ฌ, ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋–จ์–ด์ ธ ์žˆ์–ด๋„ ๋…ผ๋ฆฌ์ ์œผ๋กœ๋Š” ์—ฐ์†๋œ ๊ฒƒ์ฒ˜๋Ÿผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. 2. **์ฃผ์š” ์žฅ์ **: * **๋‹จํŽธํ™” ์ œ๊ฑฐ**: ๋ฏธ๋ฆฌ ๊ฑฐ๋Œ€ํ•œ ๊ณต๊ฐ„์„ ์˜ˆ์•ฝํ•  ํ•„์š”๊ฐ€ ์—†์–ด ๋‚ด๋ถ€ ๋‹จํŽธํ™”๊ฐ€ ๊ฑฐ์˜ ๋ฐœ์ƒํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๋ฉ”๋ชจ๋ฆฌ ํ™œ์šฉ๋ฅ ์„ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. * **๋ฉ”๋ชจ๋ฆฌ ๊ณต์œ **: ๋™์ผํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ณต์œ ํ•˜๋Š” ์—ฌ๋Ÿฌ ์š”์ฒญ(์˜ˆ: Parallel Sampling)์ด ์žˆ์„ ๋•Œ, ๊ณตํ†ต๋œ KV ๋ธ”๋ก์„ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํ•œ ๋ฒˆ๋งŒ ์ €์žฅํ•˜๊ณ  ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (Copy-on-Write ๋ฐฉ์‹). 3. **์„ฑ๋Šฅ ํ–ฅ์ƒ**: * ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ ์ฆ๊ฐ€๋Š” ๊ณง ๋™์ผํ•œ GPU ์ž์›์—์„œ ํ›จ์”ฌ ๋” ๋งŽ์€ ๋™์‹œ ์š”์ฒญ(Throughput)์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ## โš–๏ธ Trade-offs & Caveats * **๋ณต์žกํ•œ ์ปค๋„ ๊ตฌํ˜„**: ๋น„์—ฐ์†์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก์„ ๋น ๋ฅด๊ฒŒ ์ฝ๊ณ  ์“ฐ๋Š” ์ „์šฉ CUDA ์ปค๋„์ด ํ•„์š”ํ•˜์—ฌ ๊ตฌํ˜„ ๋‚œ์ด๋„๊ฐ€ ๋†’์Šต๋‹ˆ๋‹ค. * **๋ธ”๋ก ํฌ๊ธฐ ๋ฏผ๊ฐ๋„**: ๋ธ”๋ก ํฌ๊ธฐ(์˜ˆ: 8, 16 ํ† ํฐ) ์„ค์ •์— ๋”ฐ๋ผ GPU ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํšจ์œจ์„ฑ๊ณผ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์˜ค๋ฒ„ํ—ค๋“œ ์‚ฌ์ด์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) * **์ƒ์œ„ ๊ฐœ๋…**: [[Key-Value (KV) Cache|Key-Value (KV) Cache]], [[Virtual Memory Paging|Virtual Memory Paging]] * **๋Œ€ํ‘œ ํ”„๋ ˆ์ž„์›Œํฌ**: [[vLLM|vLLM]], [[TensorRT-LLM|TensorRT-LLM]] * **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[KV Cache Compression|KV Cache Compression]], [[ThinKV|ThinKV]] --- *Last updated: 2026-05-04*