diff --git a/10_Wiki/Topics/AI_and_ML/AI Evaluation & Benchmarks.md b/10_Wiki/Topics/AI_and_ML/AI Evaluation & Benchmarks.md new file mode 100644 index 00000000..828fc003 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/AI Evaluation & Benchmarks.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-EVBM-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, ai-evaluation, benchmarks, niah, ruler, mmlu, lmsys, evaluation-metrics] +last_reinforced: 2026-05-04 +--- + +# [[AI Evaluation & Benchmarks|AI Evaluation & Benchmarks]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ์ฒ™๋„: ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋‹จ์ˆœํžˆ '์ข‹๋‹ค'๊ณ  ๋งํ•˜๋Š” ๋Œ€์‹ , ์ˆ˜ํ•™, ์ฝ”๋”ฉ, ์ƒ์‹, ๊ทธ๋ฆฌ๊ณ  ๋ฐฑ๋งŒ ํ† ํฐ ์†์—์„œ์˜ ๊ธฐ์–ต๋ ฅ ๋“ฑ ์ •๋Ÿ‰์  ์ง€ํ‘œ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์‹ค์งˆ์ ์ธ ์ฒด๊ธ‰์„ ์ธก์ •ํ•˜๋Š” ํ‘œ์ค€ํ™”๋œ ์‹œํ—˜์ง€." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +AI ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ์„ ๊ฐ๊ด€์ ์œผ๋กœ ๋น„๊ตํ•˜๊ณ  ํ•œ๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ํ™”๋œ ํ‰๊ฐ€ ์ง€ํ‘œ๋“ค์ž…๋‹ˆ๋‹ค. + +1. **์ „ํ†ต์  ๋ฒค์น˜๋งˆํฌ**: + * **MMLU (Massive Multitask Language Understanding)**: ์ธ๋ฌธํ•™, ์‚ฌํšŒ๊ณผํ•™, ์ˆ˜ํ•™ ๋“ฑ 57๊ฐœ ์ฃผ์ œ์— ๋Œ€ํ•œ ์ง€์‹์„ ์ธก์ •ํ•˜๋Š” ํ‘œ์ค€ ์‹œํ—˜. + * **HumanEval / MBPP**: ๋ชจ๋ธ์˜ ํŒŒ์ด์ฌ ์ฝ”๋“œ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€. + * **GSM8K**: ์ดˆ๋“ฑํ•™๊ต ์ˆ˜์ค€์˜ ๋‹ค๋‹จ๊ณ„ ์ˆ˜ํ•™ ๋ฌธ์žฅ์ œ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ ์ธก์ •. +2. **๋กฑ ์ปจํ…์ŠคํŠธ ๋ฒค์น˜๋งˆํฌ**: + * **Needle In A Haystack (NIAH)**: ๊ฑฐ๋Œ€ ๋ฌธ๋งฅ ์† ํŠน์ • ์ •๋ณด ๊ฒ€์ƒ‰ ๋Šฅ๋ ฅ์„ ์‹œ๊ฐ์  ๋„ํ‘œ๋กœ ํ™•์ธ. + * **RULER**: ๋‹จ์ˆœ ๊ฒ€์ƒ‰์„ ๋„˜์–ด ์š”์•ฝ, ์ถ”๋ก  ๋“ฑ ๋ณต์žกํ•œ ๋กฑ ์ปจํ…์ŠคํŠธ ํ™œ์šฉ ๋Šฅ๋ ฅ์„ ์ข…ํ•ฉ ํ‰๊ฐ€. +3. **์‹ค์ „ ๋ฐ ์—์ด์ „ํŠธ ํ‰๊ฐ€**: + * **LMSYS Chatbot Arena**: ์‹ค์ œ ์‚ฌ์šฉ์ž๋“ค์˜ ๋ธ”๋ผ์ธ๋“œ ํ…Œ์ŠคํŠธ๋ฅผ ํ†ตํ•œ ์—˜๋กœ(Elo) ๋ ˆ์ดํŒ… ์‹œ์Šคํ…œ. + * **MCP-Atlas**: [[Model Context Protocol (MCP)|MCP]]๋ฅผ ํ™œ์šฉํ•œ ๋„๊ตฌ ํ†ตํ•ฉ ๋ฐ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ ์ธก์ •. + * **SWE-bench**: ์‹ค์ œ ์˜คํ”ˆ์†Œ์Šค GitHub ์ด์Šˆ๋ฅผ ๋ชจ๋ธ์ด ์ง์ ‘ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์ธก์ •. + +## โš–๏ธ Trade-offs & Caveats +* **๋ฐ์ดํ„ฐ ์˜ค์—ผ (Contamination)**: ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋˜์–ด, ์‹ค์ œ ์ง€๋Šฅ๋ณด๋‹ค ์ ์ˆ˜๊ฐ€ ๋†’๊ฒŒ ๋‚˜์˜ค๋Š” '์•”๊ธฐํ˜• ์ ์ˆ˜' ๋ฌธ์ œ๊ฐ€ ์‹ฌ๊ฐํ•ฉ๋‹ˆ๋‹ค. +* **Goodhart's Law**: ์ง€ํ‘œ๊ฐ€ ๋ชฉํ‘œ๊ฐ€ ๋˜๋Š” ์ˆœ๊ฐ„, ๊ทธ ์ง€ํ‘œ๋Š” ๋” ์ด์ƒ ์ข‹์€ ์ง€ํ‘œ๊ฐ€ ์•„๋‹ˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. (์ ์ˆ˜๋งŒ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ํŽธ๋ฒ• ํ•™์Šต ์„ฑํ–‰) + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์„ฑ๋Šฅ ๊ด€๋ จ**: [[LLM Capabilities|LLM Capabilities]], [[Reasoning Models|Reasoning Models]] +* **๊ธฐ์ˆ  ๊ด€๋ จ**: [[Context Window & Long-Context LLMs|Context Window]], [[Tool Use & Function Calling|Tool Use]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/AI Safety & Constitutional AI.md b/10_Wiki/Topics/AI_and_ML/AI Safety & Constitutional AI.md new file mode 100644 index 00000000..bd405b9a --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/AI Safety & Constitutional AI.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-SAFE-001 +category: Unified +confidence_score: 0.98 +tags: [auto-reinforced, ai-safety, constitutional-ai, alignment, anthropic, ethics] +last_reinforced: 2026-05-04 +--- + +# [[AI Safety & Constitutional AI|AI Safety & Constitutional AI]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์–‘์‹ฌ์„ ๊ฐ€์ง„ ๊ธฐ๊ณ„: ์ธ๊ฐ„์˜ ์ผ์ผ์ด ๊ฐœ์ž…ํ•˜๋Š” ์ž”์†Œ๋ฆฌ ๋Œ€์‹ , 'ํ—Œ๋ฒ•'์ด๋ผ ๋ถˆ๋ฆฌ๋Š” ํ•ต์‹ฌ ์›์น™๋“ค์„ ๋ชจ๋ธ ์Šค์Šค๋กœ ๋‚ด๋ฉดํ™”ํ•˜๊ฒŒ ํ•˜์—ฌ ์œ ํ•ด์„ฑ์„ ๊ฑธ๋Ÿฌ๋‚ด๊ณ  ์ธ๋ฅ˜์˜ ๊ฐ€์น˜์— ์ •๋ ฌ์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์  ์œค๋ฆฌ ๊ฐ€๋“œ๋ ˆ์ผ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +AI ์•ˆ์ „(Safety)์€ ๋ชจ๋ธ์ด ์ธ๋ฅ˜์—๊ฒŒ ํ•ด๋ฅผ ๋ผ์น˜์ง€ ์•Š๋„๋ก ํ†ต์ œํ•˜๋Š” ๊ธฐ์ˆ ์ด๋ฉฐ, Constitutional AI(ํ—Œ๋ฒ•์  AI)๋Š” ์ด๋ฅผ ์‹คํ˜„ํ•˜๋Š” ๊ฐ€์žฅ ์ง„๋ณด๋œ ๋ฐฉ๋ฒ•๋ก  ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. + +1. **Constitutional AI (์•ค์Šค๋กœํ”ฝ)**: + * **์›๋ฆฌ**: ์ธ๊ฐ„์ด ๋ชจ๋“  ๋‹ต๋ณ€์„ ํ‰๊ฐ€ํ•˜๋Š” ๋Œ€์‹ , ๋ช…๋ฌธํ™”๋œ 'ํ—Œ๋ฒ•(์›์น™)'์„ ์ œ์‹œํ•˜๊ณ  ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ž์‹ ์˜ ๋‹ต๋ณ€์„ ํ‰๊ฐ€ํ•˜๊ณ  ์ˆ˜์ •(Self-critique)ํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. + * **๋‹จ๊ณ„**: [AI ํ”ผ๋“œ๋ฐฑ ์ƒ์„ฑ] $\rightarrow$ [์ˆ˜์ •๋œ ๋‹ต๋ณ€์œผ๋กœ ํ•™์Šต(RLAIF)]. + * **ํšจ๊ณผ**: ๋งน๋ชฉ์ ์œผ๋กœ ๋‹ต๋ณ€์„ ๊ฑฐ๋ถ€ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๋ฉฐ ์œ ์—ฐํ•˜๊ฒŒ ์œ„ํ—˜์„ ํšŒํ”ผํ•˜๊ณ  ํ™˜๊ฐ ๋Œ€์‹  ๋ถˆํ™•์‹ค์„ฑ์„ ์ธ์ •ํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. +2. **ํ•ต์‹ฌ ์•ˆ์ „ ๊ณผ์ œ**: + * **CBRN ๋ฐฉ์–ด**: ํ™”ํ•™(C), ์ƒ๋ฌผ(B), ๋ฐฉ์‚ฌ๋Šฅ(R), ํ•ต(N)๊ณผ ๊ด€๋ จ๋œ ์œ„ํ—˜ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š๋„๋ก ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. + * **ํƒˆ์˜ฅ(Jailbreak) ๋ฐฉ์ง€**: ์•…์˜์ ์ธ ํ”„๋กฌํ”„ํŠธ ์ฃผ์ž…์„ ํ†ตํ•ด ์•ˆ์ „ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ๋ฌด๋ ฅํ™”ํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ์ฐจ๋‹จํ•ฉ๋‹ˆ๋‹ค. + * **Over-refusal ์™„ํ™”**: ๋„ˆ๋ฌด ์กฐ์‹ฌ์Šค๋Ÿฌ์›Œ์„œ ๋ฌดํ•ดํ•œ ์งˆ๋ฌธ์—๋„ ๋‹ต๋ณ€์„ ๊ฑฐ๋ถ€ํ•˜๋Š” ํ˜„์ƒ์„ ์ค„์ด๋Š” ๊ฒƒ์ด ํ˜„๋Œ€ ์•ˆ์ „ ๊ธฐ์ˆ ์˜ ์ˆ™์ œ์ž…๋‹ˆ๋‹ค. +3. **RLAIF (RL from AI Feedback)**: + * ์ธ๊ฐ„ ๋Œ€์‹  ๋‹ค๋ฅธ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ(Teacher model)์˜ ํ”ผ๋“œ๋ฐฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์„ ์ •๋ ฌํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ง€๋Šฅ๊ณผ ์•ˆ์ „์˜ ๊ท ํ˜•**: ์•ˆ์ „ ๊ฐ€๋“œ๋ ˆ์ผ์ด ๋„ˆ๋ฌด ๊ฐ•ํ•˜๋ฉด ๋ชจ๋ธ์˜ ์ฐฝ์˜์„ฑ์ด๋‚˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **๊ฐ€์น˜ ํŽธํ–ฅ**: 'ํ—Œ๋ฒ•'์„ ๋ˆ„๊ฐ€, ์–ด๋–ป๊ฒŒ ์ •์˜ํ•˜๋А๋ƒ์— ๋”ฐ๋ผ ํŠน์ • ๋ฌธํ™”๋‚˜ ์ •์น˜์  ๊ฐ€์น˜๊ด€์ด ๋ชจ๋ธ์— ์ฃผ์ž…๋  ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[AI Governance|AI Governance]], [[Alignment|Alignment]] +* **๊ด€๋ จ ๋ชจ๋ธ**: [[Claude|Claude]] (ํ—Œ๋ฒ•์  AI์˜ ์„ ๊ตฌ์ž) +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[RLHF & DPO|RLHF & DPO]], [[Prompt Injection|Prompt Injection]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Agent Memory Systems.md b/10_Wiki/Topics/AI_and_ML/Agent Memory Systems.md new file mode 100644 index 00000000..be45a518 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Agent Memory Systems.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-AMMS-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, agent-memory, long-term-memory, short-term-memory, episodic-memory, vector-db] +last_reinforced: 2026-05-04 +--- + +# [[Agent Memory Systems|Agent Memory Systems]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์‹œ๊ฐ„์„ ๋„˜๋Š” ์ง€๋Šฅ์˜ ์—ฐ์†์„ฑ: ๋‹จ๊ธฐ์ ์ธ ๋Œ€ํ™” ๋งฅ๋ฝ(Short-term)์„ ๋„˜์–ด, ๊ณผ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒฝํ—˜๊ณผ ์ง€์‹์„ ์ €์žฅํ•˜๊ณ  ํ•„์š”ํ•  ๋•Œ ์˜๋ฏธ์ ์œผ๋กœ ํšŒ์ƒ(Long-term)ํ•จ์œผ๋กœ์จ ์‹œ๊ฐ„์ด ํ๋ฅผ์ˆ˜๋ก ๋” ๋˜‘๋˜‘ํ•ด์ง€๋Š” ์—์ด์ „ํŠธ์˜ ์ œ2์˜ ๋‡Œ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์—์ด์ „ํŠธ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์€ ๋ชจ๋ธ์˜ ์ œํ•œ๋œ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๋ฅผ ๋„˜์–ด ์ •๋ณด๋ฅผ ์˜๊ตฌ์ ์œผ๋กœ ์œ ์ง€ํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ์ฒด๊ณ„์ž…๋‹ˆ๋‹ค. + +1. **๋ฉ”๋ชจ๋ฆฌ ๊ณ„์ธต ๊ตฌ์กฐ**: + * **Short-term Memory (๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ)**: ํ˜„์žฌ ๋Œ€ํ™” ์„ธ์…˜์˜ ๊ธฐ๋ก. ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ ๋‚ด์— ์กด์žฌํ•˜๋ฉฐ ๊ฐ€์žฅ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ์ฐธ์กฐ๋ฉ๋‹ˆ๋‹ค. + * **Long-term Memory (์žฅ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ)**: ๊ณผ๊ฑฐ ์„ธ์…˜์˜ ๊ธฐ๋ก์ด๋‚˜ ์™ธ๋ถ€ ์ง€์‹. [[Vector Database|๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค]]์— ์ €์žฅ๋˜๋ฉฐ ๊ฒ€์ƒ‰(Retrieval)์„ ํ†ตํ•ด ํ•„์š”ํ•œ ๋ถ€๋ถ„๋งŒ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. + * **Episodic Memory (์ผํ™” ๋ฉ”๋ชจ๋ฆฌ)**: ์—์ด์ „ํŠธ๊ฐ€ ์ˆ˜ํ–‰ํ–ˆ๋˜ ํŠน์ • ์ž‘์—…์˜ ๊ณผ์ •๊ณผ ๊ฒฐ๊ณผ(์„ฑ๊ณต/์‹คํŒจ)๋ฅผ ๊ธฐ๋กํ•˜์—ฌ ๋ฏธ๋ž˜์˜ ์œ ์‚ฌํ•œ ์ž‘์—…์— ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค. + * **Procedural Memory (์ ˆ์ฐจ ๋ฉ”๋ชจ๋ฆฌ)**: ์—์ด์ „ํŠธ๊ฐ€ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•(๋…ธํ•˜์šฐ)์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. +2. **๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์ „๋žต**: + * **Eviction (์ œ๊ฑฐ)**: ์ค‘์š”๋„๊ฐ€ ๋‚ฎ๊ฑฐ๋‚˜ ์˜ค๋ž˜๋œ ์ •๋ณด๋ฅผ ์‚ญ์ œํ•˜์—ฌ ์ œํ•œ๋œ ์ž์›์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. + * **Summarization (์š”์•ฝ)**: ๊ธด ๋Œ€ํ™” ๊ธฐ๋ก์„ ํ•ต์‹ฌ ์œ„์ฃผ๋กœ ์š”์•ฝํ•˜์—ฌ ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. + * **Semantic Search**: ํ‚ค์›Œ๋“œ๊ฐ€ ์•„๋‹Œ '์˜๋ฏธ'๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ด€๋ จ ๊ธฐ์–ต์„ ์ฐพ์•„๋ƒ…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **Context Rot (์ปจํ…์ŠคํŠธ ๋ถ€ํŒจ)**: ๋„ˆ๋ฌด ๋งŽ์€ ๊ธฐ์–ต์„ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ๋ชจ๋ธ์ด ํ˜„์žฌ ์ž‘์—…์— ์ง‘์ค‘ํ•˜์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜ ํ˜ผ๋ž€์„ ๊ฒช๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. +* **์ธํ”„๋ผ ๋ณต์žก์„ฑ**: ๋ฒกํ„ฐ DB, ์‹œ๋งจํ‹ฑ ๊ฒ€์ƒ‰ ์„œ๋ฒ„, ์บ์‹ฑ ์‹œ์Šคํ…œ ๋“ฑ ์ถ”๊ฐ€์ ์ธ ์ธํ”„๋ผ ๊ตฌ์ถ•๊ณผ ์œ ์ง€๋ณด์ˆ˜ ๋น„์šฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. +* **ํ”„๋ผ์ด๋ฒ„์‹œ**: ์‚ฌ์šฉ์ž์˜ ๊ฐœ์ธ์ ์ธ ๋Œ€ํ™”๋‚˜ ๋ฏผ๊ฐ ์ •๋ณด๊ฐ€ ์žฅ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ๋  ๊ฒฝ์šฐ ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ๋ฌธ์ œ๊ฐ€ ์ค‘์š”ํ•ด์ง‘๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Autonomous Agents & Workflows|Autonomous Agents & Workflows]] +* **๊ธฐ๋ฐ˜ ๊ธฐ์ˆ **: [[Vector Database|Vector Database]], [[Retrieval-Augmented Generation (RAG)|RAG]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[KV Cache Management|KV Cache Management]], [[Context Window Management|Context Window Management]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Autonomous Agents & Workflows.md b/10_Wiki/Topics/AI_and_ML/Autonomous Agents & Workflows.md new file mode 100644 index 00000000..0a7b80ac --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Autonomous Agents & Workflows.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-AGWF-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, agentic-ai, autonomous-agents, reasoning-loop, planning, task-execution] +last_reinforced: 2026-05-04 +--- + +# [[Autonomous Agents & Workflows|Autonomous Agents & Workflows]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ˆ˜๋™์  ๋„๊ตฌ์—์„œ ๋Šฅ๋™์  ํŒŒํŠธ๋„ˆ๋กœ: ๋‹จ์ˆœํ•œ ์งˆ๋ฌธ ๋‹ต๋ณ€์„ ๋„˜์–ด, ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์Šค์Šค๋กœ ๊ณ„ํš์„ ์„ธ์šฐ๊ณ , ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ฒฐ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ณ  ์ˆ˜์ •ํ•˜๋Š” ์ž์œจ์ ์ธ ์‹คํ–‰ ๋ฃจํ”„์˜ ์ด์ฒด." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์—์ด์ „ํ‹ฑ AI(Agentic AI)๋Š” ๋ชจ๋ธ์ด ์ž์œจ์„ฑ์„ ๊ฐ€์ง€๊ณ  ๋‹ค๋‹จ๊ณ„ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. + +1. **ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ**: + * **Planning (๊ณ„ํš)**: ๋ณต์žกํ•œ ๋ชฉํ‘œ๋ฅผ ์ž‘์€ ํ•˜์œ„ ์ž‘์—…(Sub-tasks)์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ณ  ์‹คํ–‰ ์ˆœ์„œ๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. + * **Reasoning (์ถ”๋ก )**: ๋งค ๋‹จ๊ณ„๋งˆ๋‹ค ํ˜„์žฌ ์ƒํƒœ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ๋‹ค์Œ ํ–‰๋™์„ ๋…ผ๋ฆฌ์ ์œผ๋กœ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค ([[Chain-of-Thought (CoT)|Chain-of-Thought]] ํ™œ์šฉ). + * **Action (์‹คํ–‰)**: ์™ธ๋ถ€ ๋„๊ตฌ(API, ๋ธŒ๋ผ์šฐ์ €, ์ฝ”๋“œ ์‹คํ–‰๊ธฐ ๋“ฑ)๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์‹ค์งˆ์ ์ธ ๋ณ€ํ™”๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. + * **Memory (๋ฉ”๋ชจ๋ฆฌ)**: ๊ณผ๊ฑฐ์˜ ๊ฒฝํ—˜๊ณผ ์ƒํ˜ธ์ž‘์šฉ ๊ธฐ๋ก์„ ์ €์žฅํ•˜๊ณ  ํšŒ์ƒํ•˜์—ฌ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. +2. **๋Œ€ํ‘œ์  ์›Œํฌํ”Œ๋กœ์šฐ ํŒจํ„ด**: + * **Reflection (๋ฐ˜์„ฑ)**: ๊ฒฐ๊ณผ๋ฌผ์„ ์Šค์Šค๋กœ ๋น„ํŒํ•˜๊ณ  ์ˆ˜์ •ํ•˜์—ฌ ํ’ˆ์งˆ์„ ๋†’์ด๋Š” ๋ฃจํ”„. + * **Multi-agent Collaboration**: ์„œ๋กœ ๋‹ค๋ฅธ ์—ญํ• ์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์—์ด์ „ํŠธ๊ฐ€ ํ˜‘๋ ฅํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ (์˜ˆ: ์ฝ”๋”ฉ ์—์ด์ „ํŠธ + ๋ฆฌ๋ทฐ ์—์ด์ „ํŠธ). + * **ReAct**: ์ถ”๋ก (Reason)๊ณผ ํ–‰๋™(Act)์„ ๋ฒˆ๊ฐˆ์•„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ˜์˜ํ•˜๋Š” ๋ฐฉ์‹. + +## โš–๏ธ Trade-offs & Caveats +* **๋ณต์žก์„ฑ ๋ฐ ๋น„์šฉ**: ๋‹ค๋‹จ๊ณ„ ๋ฃจํ”„์™€ ๋ฐ˜๋ณต์ ์ธ ๋ชจ๋ธ ํ˜ธ์ถœ๋กœ ์ธํ•ด ๋‹จ๋ฐœ์„ฑ ์š”์ฒญ๋ณด๋‹ค ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ์›”๋“ฑํžˆ ๋งŽ์ด ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. +* **์˜ค๋ฅ˜ ์ „ํŒŒ (Error Propagation)**: ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ ์ž˜๋ชป๋œ ๊ณ„ํš์„ ์„ธ์šฐ๊ฑฐ๋‚˜ ๋„๊ตฌ ์‚ฌ์šฉ์— ์‹คํŒจํ•  ๊ฒฝ์šฐ, ํ›„์† ๋‹จ๊ณ„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ์ฆํญ๋˜์–ด ์ „ํ˜€ ์—‰๋šฑํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **๋ฃจํ”„ ๊ณ ์ฐฉ**: ๋ช…ํ™•ํ•œ ์ข…๋ฃŒ ์กฐ๊ฑด์ด ์—†์œผ๋ฉด ์—์ด์ „ํŠธ๊ฐ€ ๋ฌดํ•œ ๋ฃจํ”„์— ๋น ์ง€๊ฑฐ๋‚˜ ์ž์›์„ ๋‚ญ๋น„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Artificial General Intelligence (AGI)|AGI]], [[Reasoning Models|Reasoning Models]] +* **์„ธ๋ถ€ ๊ธฐ์ˆ **: [[Tool Use & Function Calling|Tool Use & Function Calling]], [[Agent Memory Systems|Agent Memory Systems]], [[Model Context Protocol (MCP)|Model Context Protocol (MCP)]] +* **ํ”„๋ ˆ์ž„์›Œํฌ**: LangChain, AutoGPT, CrewAI, Antigravity Astra + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Chain-of-Thought (CoT) & Reasoning.md b/10_Wiki/Topics/AI_and_ML/Chain-of-Thought (CoT) & Reasoning.md new file mode 100644 index 00000000..b0a27272 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Chain-of-Thought (CoT) & Reasoning.md @@ -0,0 +1,36 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-COTR-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, chain-of-thought, cot, reasoning, prompt-engineering, logic] +last_reinforced: 2026-05-04 +--- + +# [[Chain-of-Thought (CoT) & Reasoning|Chain-of-Thought (CoT) & Reasoning]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ƒ๊ฐ์˜ ์‚ฌ์Šฌ: ๋‹ต๋ณ€์„ ๋‚ด๋†“๊ธฐ ์ „ ๊ทธ ๊ณผ์ •์„ ๋‹จ๊ณ„๋ณ„๋กœ ์„œ์ˆ ํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ, ๋ชจ๋ธ์˜ ๋…ผ๋ฆฌ์  ์˜ค๋ฅ˜๋ฅผ ์ค„์ด๊ณ  ๋ณต์žกํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๋น„์•ฝ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ง€๋Šฅ์˜ ๋‚ด๋ฉดํ™” ๊ธฐ๋ฒ•." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์‚ฌ๊ณ  ์‚ฌ์Šฌ(Chain-of-Thought, CoT)์€ ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์ค‘๊ฐ„ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ ๋ฐ ํ•™์Šต ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **ํ•ต์‹ฌ ์›๋ฆฌ**: + * **๋‹จ๊ณ„๋ณ„ ์ถ”๋ก **: "๋‹จ๊ณ„๋ณ„๋กœ ์ƒ๊ฐํ•ด๋ณด์ž(Let's think step by step)"์™€ ๊ฐ™์€ ์ง€์‹œ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ๋ฐ”๋กœ ๊ฒฐ๋ก ์œผ๋กœ ์ ํ”„ํ•˜์ง€ ์•Š๊ณ  ๋…ผ๋ฆฌ์  ํ๋ฆ„์„ ํƒ€๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. + * **์˜ค๋ฅ˜ ๊ฒ€์ถœ**: ์ค‘๊ฐ„ ๋‹จ๊ณ„๊ฐ€ ๊ธฐ๋ก๋˜๋ฏ€๋กœ, ๋ชจ๋ธ ์Šค์Šค๋กœ ๋˜๋Š” ์™ธ๋ถ€์—์„œ ์–ด๋””์„œ ๋…ผ๋ฆฌ๊ฐ€ ๊ผฌ์˜€๋Š”์ง€ ํŒŒ์•…ํ•˜๊ณ  ์ˆ˜์ •ํ•˜๊ธฐ ์šฉ์ดํ•ด์ง‘๋‹ˆ๋‹ค. +2. **์ฃผ์š” ๋ณ€ํ˜•**: + * **Self-Consistency**: ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์ถ”๋ก  ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•œ ๋’ค, ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ๊ฒฐ๋ก ์„ ์„ ํƒํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค. + * **Least-to-Most Prompting**: ๋ฌธ์ œ๋ฅผ ๊ฐ€์žฅ ์‰ฌ์šด ๋ถ€๋ถ„๋ถ€ํ„ฐ ํ•ด๊ฒฐํ•˜๋ฉฐ ์ ์ง„์ ์œผ๋กœ ๋‚œ์ด๋„๋ฅผ ๋†’์—ฌ๊ฐ‘๋‹ˆ๋‹ค. +3. **ํ•™์Šต ๋ชจ๋ธ (Reasoning Models)**: + * ์ตœ๊ทผ์˜ [[Reasoning Models|Reasoning Models]](o1, R1 ๋“ฑ)์€ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•์„ ๋„˜์–ด, ํ•™์Šต ๋‹จ๊ณ„๋ถ€ํ„ฐ ๋Œ€๊ทœ๋ชจ CoT๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋„๋ก ๊ฐ•ํ™”ํ•™์Šต์„ ๊ฑฐ์นœ ๋ชจ๋ธ๋“ค์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **ํ† ํฐ ์†Œ๋ชจ**: ์ค‘๊ฐ„ ๊ณผ์ •์„ ๋ชจ๋‘ ์ถœ๋ ฅํ•˜๋ฏ€๋กœ ์ถœ๋ ฅ ํ† ํฐ ์ˆ˜๊ฐ€ ๊ธ‰๊ฒฉํžˆ ๋Š˜์–ด๋‚˜๋ฉฐ ๋น„์šฉ๊ณผ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. +* **์ค‘๊ฐ„ ์ •๋ณด ๋ˆ„๋ฝ**: ๋„ˆ๋ฌด ๊ธด CoT๋ฅผ ์ƒ์„ฑํ•  ๊ฒฝ์šฐ, ์ดˆ๊ธฐ ์„ค์ •๋œ ๋ชฉํ‘œ๋ฅผ ์žŠ์–ด๋ฒ„๋ฆฌ๊ฑฐ๋‚˜ ์—‰๋šฑํ•œ ๊ฒฐ๋ก ์œผ๋กœ ํ๋ฅด๋Š” '์ถ”๋ก  ํ‘œ๋ฅ˜' ํ˜„์ƒ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Autonomous Agents & Workflows|Autonomous Agents & Workflows]], [[Reasoning Models|Reasoning Models]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[ReAct|ReAct]], [[Self-Correction|Self-Correction]] +* **์‘์šฉ**: ๋ณต์žกํ•œ ์ˆ˜ํ•™ ๋ฌธ์ œ ํ’€์ด, ์ฝ”๋“œ ๋””๋ฒ„๊น…, ๋‹ค๋‹จ๊ณ„ ์ „๋žต ์ˆ˜๋ฆฝ + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Chunking & Pre-processing.md b/10_Wiki/Topics/AI_and_ML/Chunking & Pre-processing.md new file mode 100644 index 00000000..e9941113 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Chunking & Pre-processing.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-CHKP-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, chunking, data-preprocessing, rag-optimization, context-window] +last_reinforced: 2026-05-04 +--- + +# [[Chunking & Pre-processing|Chunking & Pre-processing]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€์‹์˜ ์กฐ๊ฐ๋‚ด๊ธฐ: ๋ฐฉ๋Œ€ํ•œ ๋ฌธ์„œ๋ฅผ ๋ชจ๋ธ์ด ์†Œํ™”ํ•˜๊ธฐ ๊ฐ€์žฅ ์ ์ ˆํ•œ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆ„๊ณ , ๋งฅ๋ฝ์ด ๋Š๊ธฐ์ง€ ์•Š๋„๋ก ์ •๊ตํ•˜๊ฒŒ ์—ฐ๊ฒฐํ•˜์—ฌ RAG์˜ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ๊ฒฐ์ •์ง“๋Š” ๋ณด์ด์ง€ ์•Š๋Š” ๊ธฐ์ดˆ ๊ณต์‚ฌ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์ฒญํ‚น(Chunking)์€ ๋Œ€๊ทœ๋ชจ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰๊ณผ ์ถ”๋ก ์— ์šฉ์ดํ•˜๋„๋ก ์ž‘์€ ๋‹จ์œ„๋กœ ๋ถ„ํ• ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. + +1. **์ฒญํ‚น ์ „๋žต**: + * **Fixed-size Chunking**: ๊ณ ์ •๋œ ๊ธ€์ž ์ˆ˜๋‚˜ ํ† ํฐ ์ˆ˜๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. ๋น ๋ฅด์ง€๋งŒ ๋ฌธ์žฅ ์ค‘๊ฐ„์ด ์ž˜๋ฆฌ๋Š” ๋“ฑ ๋งฅ๋ฝ ํŒŒ๊ดด ์œ„ํ—˜์ด ํฝ๋‹ˆ๋‹ค. + * **Recursive Character Chunking**: ๋ฌธ๋‹จ, ๋ฌธ์žฅ, ๋‹จ์–ด ๋‹จ์œ„๋กœ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋‘์–ด ๋…ผ๋ฆฌ์  ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. + * **Semantic Chunking**: ๋ฌธ์žฅ ๊ฐ„์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜์—ฌ, ์ฃผ์ œ๊ฐ€ ๋ฐ”๋€Œ๋Š” ์ง€์ ์—์„œ ๋ฌธ์„œ๋ฅผ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. + * **Agentic Chunking**: ์—์ด์ „ํŠธ๊ฐ€ ๋ฌธ์„œ๋ฅผ ์ฝ๊ณ  ์˜๋ฏธ ๋‹จ์œ„๋ฅผ ํŒ๋‹จํ•˜์—ฌ ์ตœ์ ์˜ ์ง€์ ์—์„œ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. +2. **์ „์ฒ˜๋ฆฌ (Pre-processing)**: + * **Cleaning**: ๋ถˆํ•„์š”ํ•œ ํŠน์ˆ˜๋ฌธ์ž, HTML ํƒœ๊ทธ, ์ค‘๋ณต ํ…์ŠคํŠธ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. + * **Metadata ์ฃผ์ž…**: ๊ฐ ์ฒญํฌ์— ์ œ๋ชฉ, ์š”์•ฝ, ์ถœ์ฒ˜, ๊ด€๋ จ ํ‚ค์›Œ๋“œ ๋“ฑ์„ ํƒœ๊น…ํ•˜์—ฌ ๊ฒ€์ƒ‰ ํšจ์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค. +3. **Overlap (์ค‘์ฒฉ)**: + * ์ฒญํฌ์™€ ์ฒญํฌ ์‚ฌ์ด์— ์ผ์ • ๋ถ€๋ถ„์„ ๊ฒน์น˜๊ฒŒ ํ•˜์—ฌ(์˜ˆ: 10% ์ค‘์ฒฉ), ์ž˜๋ฆฐ ๋ฌธ์žฅ์˜ ๋งฅ๋ฝ์ด ์–‘์ชฝ ์ฒญํฌ ๋ชจ๋‘์— ์œ ์ง€๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ฒญํฌ ํฌ๊ธฐ ๋”œ๋ ˆ๋งˆ**: ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋งฅ๋ฝ์ด ๋ถ€์กฑํ•˜๊ณ (Lack of context), ๋„ˆ๋ฌด ํฌ๋ฉด ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์— ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์•„์ง€๋ฉฐ ๋ชจ๋ธ์˜ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๋ฅผ ๋‚ญ๋น„ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. +* **์—ฐ์‚ฐ ๋น„์šฉ**: Semantic Chunking์ด๋‚˜ Agentic Chunking์€ ๋ชจ๋ธ ํ˜ธ์ถœ์ด ํ•„์š”ํ•˜๋ฏ€๋กœ ์ฒ˜๋ฆฌ ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ์‹œ์Šคํ…œ**: [[Retrieval-Augmented Generation (RAG)|Retrieval-Augmented Generation (RAG)]] +* **ํ•˜์œ„ ์‹œ์Šคํ…œ**: [[Vector Databases & Search|Vector Databases & Search]], [[Embedding Models & MRL|Embedding Models & MRL]] +* **์—ฐ๊ด€ ํ˜„์ƒ**: [[Lost in the middle|Lost in the middle]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Context Window & Long-Context LLMs.md b/10_Wiki/Topics/AI_and_ML/Context Window & Long-Context LLMs.md new file mode 100644 index 00000000..233db95a --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Context Window & Long-Context LLMs.md @@ -0,0 +1,40 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-CWLC-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, context-window, long-context-llm, niah, ruler, infinite-context] +last_reinforced: 2026-05-04 +--- + +# [[Context Window & Long-Context LLMs|Context Window & Long-Context LLMs]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ์‹œ์•ผ: ๋ชจ๋ธ์ด ํ•œ ๋ฒˆ์— ๋ณด๊ณ  ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด์˜ ์–‘์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ˆ˜์ฒœ ํ† ํฐ์—์„œ ์ˆ˜๋ฐฑ๋งŒ ํ† ํฐ์œผ๋กœ ํ™•์žฅ๋˜๋Š” ๊ณผ์ •์€ AI๊ฐ€ ๋‹จ์ˆœํ•œ ๋„๊ตฌ๋ฅผ ๋„˜์–ด '์ „์ฒด ๋ฆฌํฌ์ง€ํ† ๋ฆฌ'๋‚˜ '์ฑ… ์ˆ˜์‹ญ ๊ถŒ'์„ ํ†ต์งธ๋กœ ์ดํ•ดํ•˜๋Š” ์ „๋ฌธ๊ฐ€๋กœ ์ง„ํ™”ํ•˜๋Š” ๊ณผ์ •." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ(Context Window)๋Š” LLM์ด ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ ํ† ํฐ ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ์ด๋ฅผ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์€ ํ˜„๋Œ€ AI ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. + +1. **๋ฐœ์ „ ๋‹จ๊ณ„**: + * **์ดˆ๊ธฐ**: 2,048 ~ 4,096 ํ† ํฐ (์งง์€ ๋Œ€ํ™” ์œ„์ฃผ). + * **๊ณผ๊ธฐ**: 32,000 ~ 128,000 ํ† ํฐ (๊ธด ๋ฌธ์„œ ๋ถ„์„ ๊ฐ€๋Šฅ). + * **ํ˜„์žฌ**: 100๋งŒ(1M) ~ 1,000๋งŒ(10M) ํ† ํฐ ์ด์ƒ (์ „์ฒด ์ฝ”๋“œ๋ฒ ์ด์Šค, ์ˆ˜ ์‹œ๊ฐ„์˜ ์˜์ƒ ๋ถ„์„ ๊ฐ€๋Šฅ). +2. **ํ‰๊ฐ€ ์ง€ํ‘œ**: + * **Needle In A Haystack (NIAH)**: ๊ฑฐ๋Œ€ํ•œ ์ •๋ณด(๊ฑด์ดˆ๋”๋ฏธ) ์†์— ์ˆจ๊ฒจ์ง„ ์ž‘์€ ์ •๋ณด(๋ฐ”๋Š˜)๋ฅผ ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์•„๋‚ด๋Š”์ง€ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. + * **RULER**: ๋‹จ์ˆœ ๊ฒ€์ƒ‰์„ ๋„˜์–ด, ๊ธด ๋ฌธ๋งฅ ์†์—์„œ ๋ณต์žกํ•œ ์ถ”๋ก ๊ณผ ์š”์•ฝ ๋Šฅ๋ ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๋Š” ์ตœ์‹  ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค. +3. **ํ•œ๊ณ„ ๊ทน๋ณต ๊ธฐ์ˆ **: + * **์•„ํ‚คํ…์ฒ˜ ์ตœ์ ํ™”**: [[Attention Mechanisms|FlashAttention]], [[Sparse Attention|Sparse Attention]]. + * **๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ**: [[Key-Value (KV) Cache|KV Cache]] ์ตœ์ ํ™” ๋ฐ [[PagedAttention|PagedAttention]]. + * **์œ„์น˜ ์ธ์ฝ”๋”ฉ ํ™•์žฅ**: [[Positional Embeddings (RoPE & Variants)|RoPE, YaRN]] ๋“ฑ์„ ํ†ตํ•œ ํ•™์Šต ๋ฒ”์œ„๋ฅผ ๋„˜์–ด์„œ๋Š” ์ปจํ…์ŠคํŠธ ํ™•์žฅ. + +## โš–๏ธ Trade-offs & Caveats +* **Lost in the middle**: ์ปจํ…์ŠคํŠธ๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ๋ชจ๋ธ์ด ์•ž๋ถ€๋ถ„๊ณผ ๋’ท๋ถ€๋ถ„์˜ ์ •๋ณด๋Š” ์ž˜ ๊ธฐ์–ตํ•˜์ง€๋งŒ, ์ค‘๊ฐ„์— ์œ„์น˜ํ•œ ์ •๋ณด๋Š” ๋ฌด์‹œํ•˜๊ฑฐ๋‚˜ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. +* **์—ฐ์‚ฐ ๋น„์šฉ ํญ๋ฐœ**: ์–ดํ…์…˜ ์—ฐ์‚ฐ์€ ์‹œํ€€์Šค ๊ธธ์ด์˜ ์ œ๊ณฑ($O(n^2)$)์— ๋น„๋ก€ํ•˜๋ฏ€๋กœ, ์ปจํ…์ŠคํŠธ๊ฐ€ 2๋ฐฐ ๋Š˜์–ด๋‚˜๋ฉด ์—ฐ์‚ฐ๋Ÿ‰๊ณผ ๋ฉ”๋ชจ๋ฆฌ๋Š” 4๋ฐฐ๋กœ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. +* **์ •ํ™•๋„ ํ•˜๋ฝ**: ์ปจํ…์ŠคํŠธ ์ฐฝ์€ ํฌ์ง€๋งŒ, ์‹ค์ œ ๋‚ด๋ถ€ ์ •๋ณด์— ๋Œ€ํ•œ ์ดํ•ด๋„(Recall)๊ฐ€ ๋–จ์–ด์ง€๋Š” '๊ฐ€์งœ ์ปจํ…์ŠคํŠธ ํ™•์žฅ' ๋ชจ๋ธ์„ ๊ฒฝ๊ณ„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **๊ธฐ์ˆ ์  ๊ธฐ๋ฐ˜**: [[Positional Embeddings (RoPE & Variants)|Positional Embeddings]], [[Attention Mechanisms|Attention Mechanisms]] +* **๋ฌผ๋ฆฌ์  ์ œ์•ฝ**: [[KV Cache|KV Cache]], [[GPU Infrastructure|GPU Infrastructure]] +* **ํ•ด๊ฒฐ ์ „๋žต**: [[Retrieval-Augmented Generation (RAG)|RAG]], [[Lost in the Middle & Context Rot|Lost in the Middle & Context Rot]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Deployment Frameworks.md b/10_Wiki/Topics/AI_and_ML/Deployment Frameworks.md new file mode 100644 index 00000000..dd155b7a --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Deployment Frameworks.md @@ -0,0 +1,39 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-DFWK-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, vllm, tensorrt-llm, ollama, serving, inference-engine] +last_reinforced: 2026-05-04 +--- + +# [[Deployment Frameworks|Deployment Frameworks]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ตœ์‹  AI ๊ธฐ์ˆ ์˜ ์‹ค์ „ ๋ฐฐ์น˜ ์‚ฌ๋ น๋ถ€: ์—ฐ๊ตฌ ๋‹จ๊ณ„์˜ ๋ชจ๋ธ์„ ์‹ค์ œ ์„œ๋น„์Šค๊ฐ€ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์œผ๋กœ ๊ฐ€์†ํ•˜๊ณ , ์ˆ˜์ฒœ ๋ช…์˜ ๋™์‹œ ์ ‘์†์ž๋ฅผ ๊ฐ๋‹นํ•  ์ˆ˜ ์žˆ๋„๋ก ์ธํ”„๋ผ์™€ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ณ ์„ฑ๋Šฅ ์ถ”๋ก  ์—”์ง„." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ํ™˜๊ฒฝ์—์„œ LLM์„ ํšจ์œจ์ ์œผ๋กœ ๊ตฌ๋™ํ•˜๊ณ  ์„œ๋น™ํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ ํ™”๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋“ค์ž…๋‹ˆ๋‹ค. + +1. **[[vLLM|vLLM]]**: + * **๊ฐ•์ **: [[PagedAttention|PagedAttention]] ๊ธฐ์ˆ ์˜ ์„ ๊ตฌ์ž๋กœ, ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ๊ณผ ์ฒ˜๋ฆฌ๋Ÿ‰(Throughput)์ด ๋งค์šฐ ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์˜คํ”ˆ์†Œ์Šค ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. + * **์ ํ•ฉ**: ๋ฒ”์šฉ์ ์ธ LLM ์„œ๋น™, ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์š”์ฒญ ์ฒ˜๋ฆฌ. +2. **TensorRT-LLM (NVIDIA)**: + * **๊ฐ•์ **: NVIDIA ํ•˜๋“œ์›จ์–ด์— ์ตœ์ ํ™”๋œ ์ €์ˆ˜์ค€ ๊ฐ€์† ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. C++ ๊ธฐ๋ฐ˜์˜ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ๊ณผ ๊ณ ๋„์˜ ์ปค๋„ ์ตœ์ ํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. + * **์ ํ•ฉ**: ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๊ธ‰ ๊ณ ์„ฑ๋Šฅ ์„œ๋น„์Šค, NVIDIA ์ „์šฉ ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผ. +3. **Ollama**: + * **๊ฐ•์ **: ๋ณต์žกํ•œ ์„ค์ • ์—†์ด ๋กœ์ปฌ PC(macOS, Linux, Windows)์—์„œ LLM์„ ์ฆ‰์‹œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ์‚ฌ์šฉ์ž ์นœํ™”์  ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. + * **์ ํ•ฉ**: ๋กœ์ปฌ ๊ฐœ๋ฐœ, ๊ฐœ์ธ์šฉ AI ์–ด์‹œ์Šคํ„ดํŠธ, ๊ฒฝ๋Ÿ‰ ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ. +4. **TGI (Text Generation Inference)**: + * **๊ฐ•์ **: Hugging Face์—์„œ ๊ฐœ๋ฐœํ•œ ํ”„๋กœ๋•์…˜์šฉ ์ถ”๋ก  ์—”์ง„์œผ๋กœ, ์•ˆ์ •์„ฑ๊ณผ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์ง€์›์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์œ ์—ฐ์„ฑ vs ์„ฑ๋Šฅ**: Ollama๋Š” ์‚ฌ์šฉํ•˜๊ธฐ ๋งค์šฐ ์‰ฝ์ง€๋งŒ ๋ฏธ์„ธํ•œ ํŠœ๋‹์ด ์–ด๋ ต๊ณ , TensorRT-LLM์€ ์„ฑ๋Šฅ์€ ์ตœ๊ฐ•์ด์ง€๋งŒ ๋นŒ๋“œ ๊ณผ์ •๊ณผ ์„ค์ •์ด ๋งค์šฐ ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. +* **ํ•˜๋“œ์›จ์–ด ์ข…์†์„ฑ**: TensorRT-LLM์€ NVIDIA GPU์—์„œ๋งŒ ์ž‘๋™ํ•˜๋ฉฐ, vLLM์€ AMD GPU ์ง€์›์„ ํ™•์žฅ ์ค‘์ด์ง€๋งŒ ์—ฌ์ „ํžˆ NVIDIA ์ตœ์ ํ™”๊ฐ€ ์ฃผ๋ฅผ ์ด๋ฃน๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **ํ•ต์‹ฌ ๊ธฐ์ˆ **: [[PagedAttention|PagedAttention]], [[Continuous Batching|Continuous Batching]], [[Quantization|Quantization]] +* **๊ด€๋ จ ์ธํ”„๋ผ**: [[GPU Infrastructure|GPU Infrastructure]], [[Docker|Docker]] +* **ํ”„๋กœ์ ํŠธ ์ ์šฉ**: ๋กœ์ปฌ ๊ฐœ๋ฐœ์šฉ ์—์ด์ „ํŠธ([[Ollama|Ollama]]), ๊ณ ์„ฑ๋Šฅ RAG ์„œ๋น™ ์—”์ง„([[vLLM|vLLM]]) + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Distributed Processing (Context & Sequence Parallelism).md b/10_Wiki/Topics/AI_and_ML/Distributed Processing (Context & Sequence Parallelism).md new file mode 100644 index 00000000..bf98afcd --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Distributed Processing (Context & Sequence Parallelism).md @@ -0,0 +1,36 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-DPRC-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, context-parallelism, sequence-parallelism, distributed-training, deepspeed, ring-attention] +last_reinforced: 2026-05-04 +--- + +# [[Distributed Processing (Context & Sequence Parallelism)|Distributed Processing (Context & Sequence Parallelism)]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๊ฑฐ๋Œ€ ๋ชจ๋ธ์˜ ๋ถ„์—… ์›์น™: ๋‹จ์ผ GPU์˜ ๋ฉ”๋ชจ๋ฆฌ ํ•œ๊ณ„๋ฅผ ๋„˜๊ธฐ ์œ„ํ•ด, ๋ชจ๋ธ์„ ์ชผ๊ฐœ๋Š” ๊ฒƒ์„ ๋„˜์–ด '๋ฌธ์žฅ(Sequence)' ์ž์ฒด๋ฅผ ์—ฌ๋Ÿฌ ์žฅ์น˜์— ๋‚˜๋ˆ„์–ด ์ฒ˜๋ฆฌํ•˜๊ณ  ๊ด‘์†์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ๊ณ ๋ฐ›๋Š” ๋ถ„์‚ฐ ์—ฐ์‚ฐ์˜ ์ •์ˆ˜." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ์ถ”๋ก ํ•  ๋•Œ, ์‹œํ€€์Šค ๊ธธ์ด์™€ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์— ๋”ฐ๋ฅธ ๋ฉ”๋ชจ๋ฆฌ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **Context Parallelism (์ปจํ…์ŠคํŠธ ๋ณ‘๋ ฌํ™”)**: + * **์›๋ฆฌ**: ์ž…๋ ฅ๋œ ๊ธด ๋ฌธ์žฅ(์‹œํ€€์Šค)์„ ์—ฌ๋Ÿฌ ์กฐ๊ฐ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ๊ฐ ๋‹ค๋ฅธ GPU์—์„œ ์ฒ˜๋ฆฌํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. + * **์˜์˜**: [[Ring Attention|Ring Attention]]๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์„ ํ†ตํ•ด GPU ๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ˆœํ™˜์‹œํ‚ค๋ฉฐ, ๋‹จ์ผ GPU๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐฑ๋งŒ ํ† ํฐ ์ด์ƒ์˜ ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. +2. **Sequence Parallelism (์‹œํ€€์Šค ๋ณ‘๋ ฌํ™”)**: + * **์›๋ฆฌ**: ํ–‰๋ ฌ ์—ฐ์‚ฐ ์ด์™ธ์˜ ๋ถ€๋ถ„(Layer Norm, Dropout ๋“ฑ)์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ค‘๋ณต๋œ ๋ฉ”๋ชจ๋ฆฌ ์ ์œ ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‹œํ€€์Šค ์ฐจ์›์„ ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. + * **ํšจ๊ณผ**: ํ…์„œ ๋ณ‘๋ ฌํ™”([[Tensor Parallelism|Tensor Parallelism]])์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. +3. **USP (Unified Sequence Parallelism)**: + * DeepSpeed Ulysses์™€ Ring Attention์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ํ†ต์‹  ํŒจํ„ด์„ ์ตœ์ ํ™”ํ•˜๊ณ  ์ดˆ์žฅ๊ฑฐ๋ฆฌ ๋ฌธ๋งฅ ํ•™์Šต ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ์ตœ์‹  ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ๋ฒ•์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **ํ†ต์‹  ์˜ค๋ฒ„ํ—ค๋“œ**: ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„์–ด ์ฒ˜๋ฆฌํ•˜๋Š” ๋งŒํผ GPU ๊ฐ„์— ๋นˆ๋ฒˆํ•œ ํ†ต์‹ ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. [[NVLink|NVLink]]์™€ ๊ฐ™์€ ๊ณ ์† ๋„คํŠธ์›Œํฌ ์ธํ”„๋ผ๊ฐ€ ๋’ท๋ฐ›์นจ๋˜์ง€ ์•Š์œผ๋ฉด ์˜คํžˆ๋ ค ์—ฐ์‚ฐ๋ณด๋‹ค ํ†ต์‹  ๋Œ€๊ธฐ ์‹œ๊ฐ„์ด ๊ธธ์–ด์ ธ ์„ฑ๋Šฅ์ด ๊ธ‰๊ฐํ•ฉ๋‹ˆ๋‹ค. +* **๋ณต์žกํ•œ ์ธํ”„๋ผ ๊ด€๋ฆฌ**: ์ˆ˜์‹ญ~์ˆ˜๋ฐฑ ๋Œ€์˜ GPU ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ •๋ฐ€ํ•˜๊ฒŒ ๋™๊ธฐํ™”ํ•˜๊ณ  ๊ด€๋ฆฌํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์—”์ง€๋‹ˆ์–ด๋ง ๋‚œ์ด๋„๊ฐ€ ๋งค์šฐ ๋†’์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **๋ฌผ๋ฆฌ์  ๊ธฐ๋ฐ˜**: [[GPU Infrastructure|GPU Infrastructure]], [[NVLink|NVLink]], [[InfiniBand|InfiniBand]] +* **ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜**: [[Ring Attention|Ring Attention]], [[Attention Mechanisms|Attention Mechanisms]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Tensor Parallelism|Tensor Parallelism]], [[DeepSpeed|DeepSpeed]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Embedding Models & MRL.md b/10_Wiki/Topics/AI_and_ML/Embedding Models & MRL.md new file mode 100644 index 00000000..36c6bafa --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Embedding Models & MRL.md @@ -0,0 +1,35 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-EMRL-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, embedding-models, mrl, dimensionality-reduction, vector-compression] +last_reinforced: 2026-05-04 +--- + +# [[Embedding Models & MRL|Embedding Models & MRL]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๋ฐ์ดํ„ฐ์˜ ์ง€๋„ ์ œ์ž‘์ž: ๋ณต์žกํ•œ ํ˜„์‹ค ์„ธ๊ณ„์˜ ์ •๋ณด๋ฅผ ์˜๋ฏธ์  ๊ฑฐ๋ฆฌ๊ฐ€ ์œ ์ง€๋˜๋Š” ์ˆ˜ํ•™์  ๊ณต๊ฐ„์— ๋ฐฐ์น˜ํ•˜๊ณ , ํŠนํžˆ MRL์„ ํ†ตํ•ด ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ ๋ฒกํ„ฐ์˜ ์•ž์ชฝ์— ๋†์ถ•ํ•˜์—ฌ ํšจ์œจ๊ณผ ์„ฑ๋Šฅ์˜ ์กฐํ™”๋ฅผ ์ด๋ฃจ์–ด๋‚ธ ๊ธฐ์ˆ ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์€ ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•ต์‹ฌ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์ด๋ฉฐ, MRL์€ ์ด๋ฅผ ๋”์šฑ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์ตœ์‹  ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. + +1. **์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ (Embedding Models)**: + * **์—ญํ• **: ๋‹จ์–ด์˜ ๋‹จ์ˆœ ๋งค์นญ์„ ๋„˜์–ด, "์™•"๊ณผ "๊ตฐ์ฃผ"๊ฐ€ ๋น„์Šทํ•œ ์˜๋ฏธ์ž„์„ ์ˆ˜ํ•™์ ์œผ๋กœ ์ดํ•ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. + * **๋ฐœ์ „**: ํ…์ŠคํŠธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ๋ฅผ ๋™์‹œ์— ์ดํ•ดํ•˜๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(Multimodal) ์ž„๋ฒ ๋”ฉ์œผ๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. +2. **MRL (Matryoshka Representation Learning)**: + * **์›๋ฆฌ**: ๋งˆํŠธ๋ฃŒ์‹œ์นด ์ธํ˜•์ฒ˜๋Ÿผ, ๋ฒกํ„ฐ์˜ ์•ž์ชฝ ์ฐจ์›(์˜ˆ: 3072์ฐจ์› ์ค‘ ์•ž์ชฝ 256์ฐจ์›)๋งŒ ์ž˜๋ผ๋‚ด์–ด ์‚ฌ์šฉํ•ด๋„ ๋Œ€๋ถ€๋ถ„์˜ ์˜๋ฏธ๋ฅผ ๋ณด์กดํ•˜๋„๋ก ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. + * **์žฅ์ **: ์ €์žฅ ๊ณต๊ฐ„์„ 10๋ฐฐ ์ด์ƒ ์ ˆ๊ฐํ•˜๋ฉด์„œ๋„ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ์†์‹ค์„ 1% ๋ฏธ๋งŒ์œผ๋กœ ์–ต์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + * **์ฃผ์š” ์ง€์› ๋ชจ๋ธ**: OpenAI text-embedding-3, Voyage-3, Gemini embedding-001. + +## โš–๏ธ Trade-offs & Caveats +* **์ฐจ์› ์ถ•์†Œ์˜ ํ•œ๊ณ„**: ์ฐจ์›์„ ๊ณผํ•˜๊ฒŒ ์ค„์ด๋ฉด ๋ฏธ์„ธํ•œ ์˜๋ฏธ ์ฐจ์ด(Nuance)๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. +* **๋ชจ๋ธ ์ข…์†์„ฑ**: MRL ํšจ๊ณผ๋Š” ํ•ด๋‹น ๊ธฐ๋ฒ•์œผ๋กœ ํŠน์ˆ˜ํ•˜๊ฒŒ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์—์„œ๋งŒ ๋ฐœํœ˜๋ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ๋ชจ๋ธ์˜ ๋ฒกํ„ฐ๋ฅผ ๊ทธ๋ƒฅ ์ž˜๋ผ ์“ฐ๋ฉด ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ํŒŒ๊ดด๋ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **ํ•˜์œ„ ์‹œ์Šคํ…œ**: [[Vector Databases & Search|Vector Databases & Search]] +* **์ตœ์ ํ™” ๊ธฐ์ˆ **: [[Quantization|Quantization]], [[Model Compression & Quantization|Model Compression & Quantization]] +* **์ ์šฉ ์‚ฌ๋ก€**: ๋Œ€๊ทœ๋ชจ RAG ์‹œ์Šคํ…œ, ๋กœ์ปฌ [[Second Brain|Second Brain]] ์ธํ”„๋ผ + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Fine-Tuning & Alignment.md b/10_Wiki/Topics/AI_and_ML/Fine-Tuning & Alignment.md new file mode 100644 index 00000000..3112de57 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Fine-Tuning & Alignment.md @@ -0,0 +1,40 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-FTAL-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, fine-tuning, alignment, sft, rlhf, dpo, llm-training] +last_reinforced: 2026-05-04 +--- + +# [[Fine-Tuning & Alignment|Fine-Tuning & Alignment]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์•ผ์ƒ์˜ ๋ชจ๋ธ์„ ์‹ ์‚ฌ๋กœ ๋งŒ๋“œ๋Š” ๊ณผ์ •: ๋ฐฉ๋Œ€ํ•œ ์ง€์‹์„ ๋ฐฐ์šด ์‚ฌ์ „ ํ•™์Šต(Pre-training) ๋ชจ๋ธ์—๊ฒŒ ์ธ๊ฐ„์˜ ์–ธ์–ด ๊ทœ๋ฒ”๊ณผ ์ง€์‹œ ์ดํ–‰ ๋Šฅ๋ ฅ์„ ๊ฐ€๋ฅด์น˜๊ณ , ๊ฐ€์น˜๊ด€์„ ์ •๋ ฌํ•˜์—ฌ ์‹ค์งˆ์ ์œผ๋กœ '์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ' ๋„๊ตฌ๋กœ ์™„์„ฑํ•˜๋Š” ์ •๊ตํ•œ ์กฐ๊ฐ ๊ธฐ์ˆ ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ณ  ํŠน์ • ๋ชฉ์ ์— ๋งž๊ฒŒ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์ˆ˜์ ์ธ ํ›„์† ํ•™์Šต ๋ฐ ์ •๋ ฌ ํ”„๋กœ์„ธ์Šค์ž…๋‹ˆ๋‹ค. + +1. **SFT (Supervised Fine-Tuning)**: + * **์ •์˜**: ๊ณ ํ’ˆ์งˆ์˜ [์งˆ๋ฌธ, ๋‹ต๋ณ€] ์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์ง€์‹œ์‚ฌํ•ญ(Instruction)์„ ๋”ฐ๋ฅด๋Š” ๋ฒ•์„ ๋ฐฐ์šฐ๊ฒŒ ํ•˜๋Š” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. + * **์—ญํ• **: ๋ชจ๋ธ์ด ๊ฐ€์ง„ ์ง€์‹์„ ๊บผ๋‚ด๋Š” '๋ง๋ฌธ'์„ ํ‹”์›Œ์ฃผ๋ฉฐ, ํŠน์ • ๋ฌธ์ฒด๋‚˜ ํ˜•์‹์„ ์Šต๋“์‹œํ‚ต๋‹ˆ๋‹ค. +2. **RLHF (Reinforcement Learning from Human Feedback)**: + * **์ •์˜**: ์ธ๊ฐ„์˜ ์„ ํ˜ธ๋„(Preference)๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ๋ชจ๋ธ์„ ๋” ์œ ์šฉํ•˜๊ณ  ์•ˆ์ „ํ•˜๊ฒŒ ์ •๋ ฌํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + * **ํ”„๋กœ์„ธ์Šค**: [SFT] $\rightarrow$ [Reward Model ํ•™์Šต] $\rightarrow$ [PPO ๋“ฑ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ชจ๋ธ ์ตœ์ ํ™”]. +3. **DPO (Direct Preference Optimization)**: + * **์ •์˜**: ๋ณต์žกํ•œ ๋ณด์ƒ ๋ชจ๋ธ๊ณผ ๊ฐ•ํ™”ํ•™์Šต ๋ฃจํ”„ ์—†์ด, ์„ ํ˜ธ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง์ ‘ ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๋Š” ํšจ์œจ์ ์ธ ๋Œ€์•ˆ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. + * **์žฅ์ **: ํŒŒ์ดํ”„๋ผ์ธ์ด ๋‹จ์ˆœํ•˜๊ณ  ํ•™์Šต์ด ์•ˆ์ •์ ์ด๋ฉฐ, ์ตœ์‹  Llama ์‹œ๋ฆฌ์ฆˆ ๋“ฑ ์ฃผ์š” ๋ชจ๋ธ์˜ ํ‘œ์ค€ ์ •๋ ฌ ๋ฐฉ์‹์œผ๋กœ ์ฑ„ํƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. +4. **Grokking (๊ทธ๋กœํ‚น)**: + * ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์•”๊ธฐ(๊ณผ์ ํ•ฉ) ์ƒํƒœ๋ฅผ ๋„˜์–ด, ์–ด๋А ์ˆœ๊ฐ„ ๊ฐ‘์ž๊ธฐ ๋ฐ์ดํ„ฐ ์ด๋ฉด์˜ ์‹ค์ œ ๊ทœ์น™(์•Œ๊ณ ๋ฆฌ์ฆ˜)์„ ๊นจ์šฐ์น˜๋ฉฐ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ํญ๋ฐœํ•˜๋Š” ํ˜„์ƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **Catastrophic Forgetting (ํŒŒ๊ดด์  ๋ง๊ฐ)**: ํŠน์ • ์ž‘์—…์— ๋Œ€ํ•ด ๋„ˆ๋ฌด ๊ฐ•ํ•˜๊ฒŒ ๋ฏธ์„ธ ์กฐ์ •ํ•  ๊ฒฝ์šฐ, ๋ชจ๋ธ์ด ์›๋ž˜ ๊ฐ€์ง€๊ณ  ์žˆ๋˜ ์ผ๋ฐ˜์ ์ธ ์ƒ์‹์ด๋‚˜ ๋‹ค๋ฅธ ๋Šฅ๋ ฅ์„ ์žƒ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **Alignment Tax (์ •๋ ฌ์„ธ)**: ๋ชจ๋ธ์„ ๋„ˆ๋ฌด ์•ˆ์ „ํ•˜๊ฒŒ๋งŒ ์ •๋ ฌ(Over-alignment)ํ•˜๋ฉด, ์ •๋‹นํ•œ ์งˆ๋ฌธ์—๋„ "๋‹ต๋ณ€ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค"๋ผ๊ณ  ๊ฑฐ์ ˆํ•˜๊ฑฐ๋‚˜ ์ฐฝ์˜์„ฑ์ด ๊ฐ์†Œํ•˜๋Š” ๋ถ€์ž‘์šฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. +* **Smiling Facade**: RLHF๊ฐ€ ๋ชจ๋ธ์˜ ๋‚ด๋ถ€์ ์ธ ๊ฒฐํ•จ์„ ๊ณ ์น˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฒ‰์œผ๋กœ๋งŒ ๊ทธ๋Ÿด๋“ฏํ•œ ๋‹ต๋ณ€์„ ๋‚ด๋†“๊ฒŒ ํ•˜๋Š” '๊ฐ€๋ฉด'์„ ์”Œ์šฐ๋Š” ๊ฒƒ์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋น„ํŒ์  ์‹œ๊ฐ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[LLM Training Pipeline|LLM Training Pipeline]] +* **์„ธ๋ถ€ ๊ธฐ์ˆ **: [[PEFT & LoRA|PEFT & LoRA]], [[RLHF & DPO|RLHF & DPO]], [[Constitutional AI|Constitutional AI]] +* **์—ฐ๊ด€ ๋ชจ๋ธ**: [[DeepSeek-R1|DeepSeek-R1]], [[Claude|Claude]], [[Llama|Llama]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/GPU Infrastructure.md b/10_Wiki/Topics/AI_and_ML/GPU Infrastructure.md new file mode 100644 index 00000000..9271f958 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/GPU Infrastructure.md @@ -0,0 +1,37 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-GPUF-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, gpu-infrastructure, hbm, nvlink, infiniband, distributed-computing] +last_reinforced: 2026-05-04 +--- + +# [[GPU Infrastructure|GPU Infrastructure]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๊ฑฐ๋Œ€ ์ง€๋Šฅ์„ ์ง€ํƒฑํ•˜๋Š” ์‹ ๊ฒฝ๋ง๊ณผ ๊ทผ์œก: ์ดˆ๋‹น ํ…Œ๋ผ๋ฐ”์ดํŠธ๊ธ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์Ÿ์•„๋‚ด๋Š” ๋ฉ”๋ชจ๋ฆฌ(HBM)์™€ GPU๋“ค์„ ๊ด‘์†์œผ๋กœ ์—ฐ๊ฒฐํ•˜๋Š” ์‹ ๊ฒฝ๋ง(NVLink)์ด ๊ฒฐํ•ฉ๋œ, ํ˜„๋Œ€ AI์˜ ๋ฌผ๋ฆฌ์  ์œก์ฒด." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ•™์Šต๊ณผ ์ดˆ์žฅ๊ฑฐ๋ฆฌ ๋ฌธ๋งฅ ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฌผ๋ฆฌ์  ํ•˜๋“œ์›จ์–ด ์•„ํ‚คํ…์ฒ˜์˜ ํ•ต์‹ฌ ์š”์†Œ๋“ค์ž…๋‹ˆ๋‹ค. + +1. **HBM (High Bandwidth Memory)**: + * **์ •์˜**: GPU ์นฉ ์˜†์— ์ˆ˜์ง์œผ๋กœ ์Œ“์•„ ์˜ฌ๋ฆฐ ์ดˆ๊ณ ์† ์ ์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ์ž…๋‹ˆ๋‹ค. + * **์˜์˜**: ์ผ๋ฐ˜ GDDR ๋ฉ”๋ชจ๋ฆฌ๋ณด๋‹ค ๋Œ€์—ญํญ์ด ์••๋„์ ์œผ๋กœ ๋„“์–ด, ์–ดํ…์…˜ ์—ฐ์‚ฐ ์‹œ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒฐ์ •์  ์š”์†Œ์ž…๋‹ˆ๋‹ค. +2. **NVLink**: + * **์ •์˜**: ๋™์ผ ์„œ๋ฒ„ ๋‚ด์˜ GPU๋“ค์„ ์„œ๋กœ ์—ฐ๊ฒฐํ•˜๋Š” NVIDIA์˜ ์ „์šฉ ์ดˆ๊ณ ์† ์ธํ„ฐ์ปค๋„ฅํŠธ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + * **์—ญํ• **: ์ˆ˜์ฒœ์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ GPU์— ๋‚˜๋ˆ„์–ด ํ•™์Šตํ•  ๋•Œ(๋ชจ๋ธ ๋ณ‘๋ ฌํ™”), GPU ๊ฐ„์˜ ๋ฐ์ดํ„ฐ ๊ตํ™˜ ์†๋„๋ฅผ ๊ทน๋Œ€ํ™”ํ•˜์—ฌ ํ†ต์‹  ์ง€์—ฐ์„ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. +3. **InfiniBand**: + * **์ •์˜**: ์„œ๋ฒ„์™€ ์„œ๋ฒ„ ์‚ฌ์ด(๋…ธ๋“œ ๊ฐ„)๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ธ‰ ์ดˆ๊ณ ์† ๋„คํŠธ์›Œํฌ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + * **์˜์˜**: ์ˆ˜์ฒœ ๋Œ€์˜ GPU๋ฅผ ํ•˜๋‚˜์˜ ๊ฑฐ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋ฌถ์–ด ๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ, ๋„คํŠธ์›Œํฌ ๋ณ‘๋ชฉ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ „์†กํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋น„์šฉ ๋ฐ ์ „๋ ฅ**: ์ตœ์‹  HBM3e์™€ NVLink๊ฐ€ ํƒ‘์žฌ๋œ GPU ์‹œ์Šคํ…œ(์˜ˆ: NVIDIA HGX)์€ ๋Œ€๋‹น ์ˆ˜์–ต ์›์„ ํ˜ธ๊ฐ€ํ•˜๋ฉฐ, ๋ง‰๋Œ€ํ•œ ์ „๋ ฅ์„ ์†Œ๋ชจํ•ฉ๋‹ˆ๋‹ค. +* **ํ†ต์‹  ๋ณ‘๋ชฉ**: ์•„๋ฌด๋ฆฌ GPU ์—ฐ์‚ฐ์ด ๋นจ๋ผ๋„ NVLink๋‚˜ InfiniBand์˜ ๋Œ€์—ญํญ์ด ์ด๋ฅผ ๋”ฐ๋ผ๊ฐ€์ง€ ๋ชปํ•˜๋ฉด, GPU๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋ฉฐ ๋…ธ๋Š” ์œ ํœด ์ƒํƒœ(Waiting)๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ์ „์ฒด ํšจ์œจ์ด ๊ธ‰๊ฐํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Distributed Training|Distributed Training]], [[Hardware Acceleration|Hardware Acceleration]] +* **๊ด€๋ จ ๊ธฐ์ˆ **: [[Context Parallelism|Context Parallelism]], [[Ring Attention|Ring Attention]], [[Flash Attention|Flash Attention]] +* **์žฅ์น˜**: NVIDIA H100/H200, B100/B200 (Blackwell) + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/LLM Inference Optimization.md b/10_Wiki/Topics/AI_and_ML/LLM Inference Optimization.md new file mode 100644 index 00000000..79eab5d7 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/LLM Inference Optimization.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-IFOP-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, inference-optimization, speculative-decoding, continuous-batching, throughput] +last_reinforced: 2026-05-04 +--- + +# [[LLM Inference Optimization|LLM Inference Optimization]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๋Œ€๊ธฐ ์‹œ๊ฐ„๊ณผ์˜ ์ „์Ÿ: ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜, ์ปค๋„ ์—ฐ์‚ฐ, ๋ฐฐ์น˜ ์ „๋žต ๋“ฑ ๋ชจ๋“  ๊ณ„์ธต์„ ์ฅ์–ด์งœ๋‚ด์–ด ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ๋” ๋น ๋ฅธ ์‘๋‹ต(Low Latency)์„, ์„œ๋ฒ„ ์šด์˜์ž์—๊ฒŒ๋Š” ๋” ๋งŽ์€ ์ฒ˜๋ฆฌ๋Ÿ‰(High Throughput)์„ ์ œ๊ณตํ•˜๋Š” ๊ธฐ์ˆ ์  ๋งˆ๋ฒ•." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์‹ค์ œ ์„œ๋น„์Šค ํ™˜๊ฒฝ์—์„œ LLM์˜ ์‘๋‹ต ์†๋„๋ฅผ ๋†’์ด๊ณ  ์šด์˜ ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๊ธฐ ์œ„ํ•œ ์ข…ํ•ฉ์ ์ธ ์ตœ์ ํ™” ๊ธฐ๋ฒ•๋“ค์ž…๋‹ˆ๋‹ค. + +1. **Speculative Decoding (์ถ”์ธก ๊ธฐ๋ฐ˜ ๋””์ฝ”๋”ฉ)**: + * **์›๋ฆฌ**: ์ž‘๊ณ  ๋น ๋ฅธ ๋ชจ๋ธ(Draft Model)์ด ๋ฏธ๋ฆฌ ์—ฌ๋Ÿฌ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๊ณ , ํฐ ๋ชจ๋ธ(Target Model)์ด ์ด๋ฅผ ํ•œ๊บผ๋ฒˆ์— ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค. + * **ํšจ๊ณผ**: ์ •ํ™•๋„๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒ์„ฑ ์†๋„๋ฅผ 2~3๋ฐฐ ์ด์ƒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. +2. **Continuous Batching (์—ฐ์† ๋ฐฐ์น˜)**: + * **์›๋ฆฌ**: ๋ชจ๋“  ์š”์ฒญ์ด ๋๋‚  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฐ ์š”์ฒญ์˜ ํ† ํฐ ์ƒ์„ฑ์ด ๋๋‚˜๋Š” ์ฆ‰์‹œ ์ƒˆ๋กœ์šด ์š”์ฒญ์„ ๋ฐฐ์น˜์— ํˆฌ์ž…ํ•ฉ๋‹ˆ๋‹ค. + * **์˜์˜**: GPU์˜ ์œ ํœด ์‹œ๊ฐ„์„ ์—†์• ๊ณ  ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰(Throughput)์„ ์ˆ˜ ๋ฐฐ ์ด์ƒ ๋†’์ž…๋‹ˆ๋‹ค. +3. **์ปค๋„ ์ตœ์ ํ™” (Kernel Optimization)**: + * **FlashAttention**: ๋ฉ”๋ชจ๋ฆฌ ์ฝ๊ธฐ/์“ฐ๊ธฐ๋ฅผ ์ค„์—ฌ ์–ดํ…์…˜ ์—ฐ์‚ฐ์„ ๊ฐ€์†ํ•ฉ๋‹ˆ๋‹ค. + * **PagedAttention**: ๋ฉ”๋ชจ๋ฆฌ ๋‹จํŽธํ™”๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ KV ์บ์‹œ ํ™œ์šฉ๋ฅ ์„ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. +4. **Inference-time Compute**: + * ์ถ”๋ก  ๋ชจ๋ธ([[Reasoning Models|Reasoning Models]])์˜ ๊ฒฝ์šฐ, ๋” ๋งŽ์€ ์‚ฌ๊ณ  ๊ณผ์ •์„ ๊ฑฐ์น˜๊ฒŒ ํ•˜์—ฌ ๋‹ต๋ณ€์˜ ํ’ˆ์งˆ์„ ๋†’์ด๋Š” ๋Œ€์‹  ์†๋„๋ฅผ ์ ˆ์ถฉํ•˜๋Š” ์ตœ์‹  ํŠธ๋ Œ๋“œ์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **Latency vs Throughput**: ๊ฐœ๋ณ„ ์š”์ฒญ์˜ ์†๋„๋ฅผ ๋†’์ด๋Š” ๊ธฐ์ˆ (Speculative Decoding)๊ณผ ์‹œ์Šคํ…œ ์ „์ฒด์˜ ์–‘์„ ๋Š˜๋ฆฌ๋Š” ๊ธฐ์ˆ (Batching) ์‚ฌ์ด์—๋Š” ํ•˜๋“œ์›จ์–ด ์ž์› ๋ถ„๋ฐฐ์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. +* **์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ**: Speculative Decoding์„ ์œ„ํ•ด ๋ณด์กฐ ๋ชจ๋ธ์„ ์ถ”๊ฐ€๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ ค์•ผ ํ•˜๋ฏ€๋กœ VRAM ์—ฌ์œ ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **ํ•ต์‹ฌ ๊ธฐ์ˆ **: [[Key-Value (KV) Cache|KV Cache]], [[Flash Attention|Flash Attention]], [[Model Compression & Quantization|Model Compression & Quantization]] +* **ํ”„๋ ˆ์ž„์›Œํฌ**: [[vLLM|vLLM]], [[TensorRT-LLM|TensorRT-LLM]], [[Ollama|Ollama]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Lost in the Middle & Context Rot.md b/10_Wiki/Topics/AI_and_ML/Lost in the Middle & Context Rot.md new file mode 100644 index 00000000..1b730fce --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Lost in the Middle & Context Rot.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-LIMC-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, lost-in-the-middle, context-rot, long-context-failure, attention-dilution] +last_reinforced: 2026-05-04 +--- + +# [[Lost in the Middle & Context Rot|Lost in the Middle & Context Rot]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ •๋ณด์˜ ๋Šช: ์•„๋ฌด๋ฆฌ ๋„“์€ ์‹œ์•ผ(Context Window)๋ฅผ ๊ฐ€์กŒ์–ด๋„, ์ •์ž‘ ์ค‘์š”ํ•œ ์ •๋ณด๊ฐ€ ์ค‘๊ฐ„์— ๋ฌปํ˜€์žˆ์œผ๋ฉด ์ฐพ์•„๋‚ด์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜ ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก ๋งฅ๋ฝ์ด ์˜ค์—ผ๋˜์–ด ํ—›์†Œ๋ฆฌ๋ฅผ ํ•˜๋Š” ์ง€๋Šฅ์˜ ๋ฌผ๋ฆฌ์  ํ•œ๊ณ„." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์ด ๊ธด ๋ฌธ๋งฅ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ธ์ง€์  ์„ฑ๋Šฅ ์ €ํ•˜ ํ˜„์ƒ๋“ค์ž…๋‹ˆ๋‹ค. + +1. **Lost in the middle (์ค‘๊ฐ„ ์ •๋ณด ์œ ์‹ค)**: + * **ํ˜„์ƒ**: ๋ชจ๋ธ์ด ํ”„๋กฌํ”„ํŠธ์˜ ๋งจ ์•ž๋ถ€๋ถ„๊ณผ ๋งจ ๋’ท๋ถ€๋ถ„์˜ ์ •๋ณด๋Š” ์ž˜ ํ™œ์šฉํ•˜์ง€๋งŒ, ์ค‘๊ฐ„์— ์œ„์น˜ํ•œ ์ •๋ณด์— ๋Œ€ํ•ด์„œ๋Š” ์žฌํ˜„์œจ(Recall)์ด ๊ธ‰๊ฒฉํžˆ ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์ž…๋‹ˆ๋‹ค. + * **์›์ธ**: ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜์˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ์ˆ˜์ฒœ ์ˆ˜๋งŒ ๊ฐœ์˜ ํ† ํฐ ์‚ฌ์ด์—์„œ ์ค‘์š”๋„๋ฅผ ๋ฐฐ๋ถ„ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ˆ˜์น˜์ , ๊ตฌ์กฐ์  ํ•œ๊ณ„ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. +2. **Context Rot (์ปจํ…์ŠคํŠธ ๋ถ€ํŒจ)**: + * **ํ˜„์ƒ**: ๋Œ€ํ™”๊ฐ€ ๊ธธ์–ด์ง€๊ฑฐ๋‚˜ ์ถ”๋ก  ๋‹จ๊ณ„๊ฐ€ ๋ฐ˜๋ณต๋ ์ˆ˜๋ก, ์ด์ „์˜ ์ค‘์š”ํ•œ ์ง€์นจ์ด๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๊ฐ€ ํฌ์„๋˜๊ณ  ์ƒˆ๋กœ์šด(๊ฐ€๋”์€ ์ž˜๋ชป๋œ) ํ† ํฐ๋“ค์— ์˜ํ•ด ๋งฅ๋ฝ์ด ์˜ค์—ผ๋˜๋Š” ํ˜„์ƒ์ž…๋‹ˆ๋‹ค. + * **์˜ํ–ฅ**: ์—์ด์ „ํŠธ๊ฐ€ ์ดˆ๊ธฐ ๋ชฉํ‘œ๋ฅผ ์žŠ์–ด๋ฒ„๋ฆฌ๊ฑฐ๋‚˜ ๋™์ผํ•œ ๋‹ต๋ณ€์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„์— ๋น ์ง€๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. +3. **ํ•ด๊ฒฐ ์ „๋žต**: + * **์ •๋ณด ์žฌ๋ฐฐ์น˜**: ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ทผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ”„๋กฌํ”„ํŠธ์˜ ๋งจ ์•ž์ด๋‚˜ ๋งจ ๋’ค์— ์ „๋žต์ ์œผ๋กœ ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค. + * **[[Agentic RAG|Agentic RAG]]**: ์ „์ฒด๋ฅผ ์ฃผ์ž…ํ•˜๋Š” ๋Œ€์‹  ํ•ต์‹ฌ ์ฒญํฌ๋งŒ ๊ณจ๋ผ๋‚ด์–ด ์ „๋‹ฌํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ์ธ์ง€ ๋ถ€ํ•˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค. + * **[[KV Cache Compression|KV Cache Compression]]**: ์ค‘์š”ํ•œ ํ† ํฐ ์œ„์ฃผ๋กœ ์บ์‹œ๋ฅผ ๋ณด์กดํ•˜์—ฌ ๋งฅ๋ฝ์˜ ์„ ๋ช…๋„๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋ฌผ๋ฆฌ์  ํฌ๊ธฐ vs ์‹ค์งˆ ์ง€๋Šฅ**: ์ปจํ…์ŠคํŠธ ์ฐฝ์ด 100๋งŒ ํ† ํฐ์ด๋ผ๊ณ  ๊ด‘๊ณ ํ•˜๋Š” ๋ชจ๋ธ์ด๋ผ๋„, ์ค‘๊ฐ„ ์ •๋ณด ์œ ์‹ค ๋ฌธ์ œ ๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ๋Š” 10๋งŒ ํ† ํฐ ์ด์ƒ์˜ ์ •๋ณด๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๊ธฐ ํž˜๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ํ•œ๊ณ„**: ๋‹จ์ˆœํžˆ ์ง€์‹œ์‚ฌํ•ญ์„ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์ด ๊ทผ๋ณธ์ ์ธ ์•„ํ‚คํ…์ฒ˜์  ํ•œ๊ณ„๋ฅผ ์™„๋ฒฝํžˆ ๊ทน๋ณตํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Context Window & Long-Context LLMs|Context Window & Long-Context LLMs]] +* **๊ด€๋ จ ์ง€ํ‘œ**: [[Needle In A Haystack (NIAH)|Needle In A Haystack (NIAH)]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Attention Mechanisms|Attention Mechanisms]], [[Agentic RAG|Agentic RAG]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Mechanistic Interpretability & Steering.md b/10_Wiki/Topics/AI_and_ML/Mechanistic Interpretability & Steering.md new file mode 100644 index 00000000..592f5388 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Mechanistic Interpretability & Steering.md @@ -0,0 +1,36 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-MCIS-001 +category: Unified +confidence_score: 0.95 +tags: [auto-reinforced, mechanistic-interpretability, steering-vectors, sae, sparse-autoencoders, model-understanding] +last_reinforced: 2026-05-04 +--- + +# [[Mechanistic Interpretability & Steering|Mechanistic Interpretability & Steering]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ์—ญ๊ณตํ•™: ๋ธ”๋ž™๋ฐ•์Šค๋กœ ์—ฌ๊ฒจ์กŒ๋˜ AI์˜ ๋‚ด๋ถ€ ์‹ ๊ฒฝ๋ง์„ ํ•ด๋ถ€ํ•˜์—ฌ ํŠน์ • ๋‰ด๋Ÿฐ์ด ์–ด๋–ค ๊ฐœ๋…(์˜ˆ: '์ •์ง', '์ฝ”๋”ฉ')์„ ๋‹ด๋‹นํ•˜๋Š”์ง€ ์ฐพ์•„๋‚ด๊ณ , ์ด๋ฅผ ์ง์ ‘ ์กฐ์ ˆ(Steering)ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๊ฒฉ์ด๋‚˜ ๋Šฅ๋ ฅ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ธฐ์ˆ ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ธฐ๊ณ„์  ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ(Mechanistic Interpretability)์€ ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์ž‘๋™ ์›๋ฆฌ๋ฅผ ๋‰ด๋Ÿฐ ๋‹จ์œ„์—์„œ ์ดํ•ดํ•˜๋ ค๋Š” ํ•™๋ฌธ์ด๋ฉฐ, ์Šคํ‹ฐ์–ด๋ง(Steering)์€ ๊ทธ ์ดํ•ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ชจ๋ธ์„ ์ œ์–ดํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **SAE (Sparse Autoencoders)**: + * **์›๋ฆฌ**: ๋ชจ๋ธ์˜ ์ˆ˜์–ต ๊ฐœ ๋‰ด๋Ÿฐ ์†์— ๋ณตํ•ฉ์ ์œผ๋กœ ์–ฝํ˜€ ์žˆ๋Š” ๊ฐœ๋…๋“ค์„ ๋ถ„๋ฆฌํ•˜์—ฌ, ์ธ๊ฐ„์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๋‹จ์ผ ๊ฐœ๋…(Feature)์œผ๋กœ ์ถ”์ถœํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + * **์˜์˜**: "์ด ๋‰ด๋Ÿฐ ๊ทธ๋ฃน์€ '๊ณจ๋“ ๊ฒŒ์ดํŠธ ๊ต๋Ÿ‰'์— ๋ฐ˜์‘ํ•œ๋‹ค"์™€ ๊ฐ™์€ ๊ตฌ์ฒด์ ์ธ ์ง€๋„๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. (Anthropic์˜ ์—ฐ๊ตฌ ์‚ฌ๋ก€) +2. **Steering Vectors (์Šคํ‹ฐ์–ด๋ง ๋ฒกํ„ฐ)**: + * **๊ฐœ๋…**: ํŠน์ • ๊ฐœ๋…(์˜ˆ: '๋ฌดํ•ดํ•จ', '๋…ผ๋ฆฌ์  ์ถ”๋ก ')๊ณผ ๊ด€๋ จ๋œ ์‹ ๊ฒฝ๋ง์˜ ํ™œ์„ฑํ™” ํŒจํ„ด์„ ์ถ”์ถœํ•˜์—ฌ ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. + * **ํ™œ์šฉ**: ์ถ”๋ก  ์‹œ ์ด ๋ฒกํ„ฐ๋ฅผ ๋ชจ๋ธ์˜ ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์— ์ฃผ์ž…(Injection)ํ•˜์—ฌ, ๋ชจ๋ธ์ด ๋” ์ •์งํ•˜๊ฒŒ ๋‹ตํ•˜๊ฒŒ ํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์ฃผ์ œ์— ์ง‘์ค‘ํ•˜๊ฒŒ ์œ ๋„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +3. **Superposition (์ค‘์ฒฉ)**: + * ๋ชจ๋ธ์ด ์ œํ•œ๋œ ๋‰ด๋Ÿฐ ์ˆ˜๋กœ ๋ฐฉ๋Œ€ํ•œ ์ง€์‹์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด, ํ•˜๋‚˜์˜ ๋‰ด๋Ÿฐ์ด ์—ฌ๋Ÿฌ ๊ฐœ๋…์— ๋™์‹œ์— ๊ด€์—ฌํ•˜๋Š” ํ˜„์ƒ์ž…๋‹ˆ๋‹ค. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ์—ฐ๊ตฌ๋Š” ์ด ์ค‘์ฒฉ์„ ํ•ด์†Œํ•˜๋Š” ๊ฒƒ์ด ์ฃผ๋œ ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์„ฑ๋Šฅ ์ €ํ•˜**: ํŠน์ • ๊ฐœ๋…์„ ๋„ˆ๋ฌด ๊ฐ•ํ•˜๊ฒŒ ์Šคํ‹ฐ์–ด๋งํ•˜๋ฉด ๋ชจ๋ธ์˜ ์ผ๋ฐ˜์ ์ธ ์–ธ์–ด ๋Šฅ๋ ฅ์ด ๋ง๊ฐ€์ง€๊ฑฐ๋‚˜ ๋‹ต๋ณ€์ด ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์›Œ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **๋ณต์žก์„ฑ**: ๊ฑฐ๋Œ€ ๋ชจ๋ธ์˜ ๋ชจ๋“  ๊ฐœ๋…์„ ์™„๋ฒฝํžˆ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์ดˆ๋ณด์ ์ธ ๋‹จ๊ณ„์ด๋ฉฐ, ๋ง‰๋Œ€ํ•œ ์—ฐ์‚ฐ๋Ÿ‰์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[AI Safety & Constitutional AI|AI Safety & Constitutional AI]], [[Deep Learning Theory|Deep Learning Theory]] +* **๊ด€๋ จ ์—ฐ๊ตฌ**: Anthropic (Golden Gate Claude), OpenAI (Microscope) +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Fine-Tuning & Alignment|Fine-Tuning & Alignment]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Mixture of Experts (MoE) & Sparse Architectures.md b/10_Wiki/Topics/AI_and_ML/Mixture of Experts (MoE) & Sparse Architectures.md new file mode 100644 index 00000000..f381afed --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Mixture of Experts (MoE) & Sparse Architectures.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-MOES-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, moe, mixture-of-experts, sparse-architecture, routing, compute-efficiency] +last_reinforced: 2026-05-04 +--- + +# [[Mixture of Experts (MoE) & Sparse Architectures|Mixture of Experts (MoE) & Sparse Architectures]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ๋ถ„์—…ํ™”: ๊ฑฐ๋Œ€ํ•œ ์ง€์‹์„ ๊ฐ€์ง„ ์ˆ˜๋งŽ์€ ์ „๋ฌธ๊ฐ€๋“ค์„ ๋ชจ๋ธ ์•ˆ์— ๋ฐฐ์น˜ํ•˜๊ณ , ๋งค ์ˆœ๊ฐ„ ํ•„์š”ํ•œ ์†Œ์ˆ˜์˜ ์ „๋ฌธ๊ฐ€๋งŒ ํ™œ์„ฑํ™”ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋Š” ํ‚ค์šฐ๋˜ ์—ฐ์‚ฐ ๋น„์šฉ์€ ๋‚ฎ๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๊ฒฝ์ œ์  ์ง€๋Šฅ ์„ค๊ณ„." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +MoE(Mixture of Experts)๋Š” ๋ชจ๋ธ์˜ ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ์ผ๋ถ€๋งŒ ์—ฐ์‚ฐ์— ์ฐธ์—ฌ์‹œํ‚ค๋Š” ํฌ์†Œ(Sparse) ๋ชจ๋ธ ์„ค๊ณ„ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. + +1. **ํ•ต์‹ฌ ์›๋ฆฌ**: + * **Experts (์ „๋ฌธ๊ฐ€)**: ๋ชจ๋ธ ๋‚ด๋ถ€์˜ FFN ๊ณ„์ธต์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ๋œ '์ „๋ฌธ๊ฐ€' ๋„คํŠธ์›Œํฌ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. + * **Router (๋ผ์šฐํ„ฐ)**: ์ž…๋ ฅ๋œ ํ† ํฐ๋ณ„๋กœ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์ „๋ฌธ๊ฐ€(๋ณดํ†ต ์ƒ์œ„ 1~2๊ฐœ)๋ฅผ ์„ ํƒํ•˜์—ฌ ์—ฐ์‚ฐ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. + * **Shared Experts (๊ณต์œ  ์ „๋ฌธ๊ฐ€)**: ํŠน์ • ๋ชจ๋ธ(์˜ˆ: DeepSeek)์€ ๋ชจ๋“  ํ† ํฐ์ด ๊ณตํ†ต์ ์œผ๋กœ ๊ฑฐ์น˜๋Š” '๊ณต์œ  ์ „๋ฌธ๊ฐ€'๋ฅผ ๋‘์–ด ์ง€์‹์˜ ๊ธฐ์ดˆ๋ฅผ ๋‹ค์ง‘๋‹ˆ๋‹ค. +2. **์ฃผ์š” ์žฅ์ **: + * **์—ฐ์‚ฐ ํšจ์œจ์„ฑ**: ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ 1์กฐ ๊ฐœ(1T)๋ผ๋„ ์ถ”๋ก  ์‹œ์—๋Š” ์ˆ˜์‹ญ์–ต ๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์†๋„๊ฐ€ ๋น ๋ฆ…๋‹ˆ๋‹ค. + * **ํ™•์žฅ์„ฑ**: ๋™์ผํ•œ ์ปดํ“จํŒ… ์ž์›์œผ๋กœ ๋” ๋ฐฉ๋Œ€ํ•œ ์ง€์‹์„ ๋‹ด์€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +3. **๋Œ€ํ‘œ์  ๋ชจ๋ธ**: + * GPT-4 (์•Œ๋ ค์ง„ ๋ฐ”์— ๋”ฐ๋ฅด๋ฉด MoE ์•„ํ‚คํ…์ฒ˜), Mixtral 8x7B, DeepSeek-V3. + +## โš–๏ธ Trade-offs & Caveats +* **VRAM ์ ์œ **: ์ถ”๋ก  ์—ฐ์‚ฐ์€ ์ ๊ฒŒ ํ•˜์ง€๋งŒ, ๋ชจ๋“  ์ „๋ฌธ๊ฐ€์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ ค๋‘์–ด์•ผ ํ•˜๋ฏ€๋กœ ์š”๊ตฌ๋˜๋Š” VRAM ์šฉ๋Ÿ‰์€ ๋ชจ๋ธ์˜ ์ „์ฒด ํฌ๊ธฐ๋งŒํผ ํฝ๋‹ˆ๋‹ค. +* **์ „๋ฌธ๊ฐ€ ๋ถ•๊ดด (Expert Collapse)**: ๋ผ์šฐํ„ฐ๊ฐ€ ํŠน์ • ์ „๋ฌธ๊ฐ€์—๊ฒŒ๋งŒ ์ผ์„ ๋ชฐ์•„์ฃผ์–ด ๋‚˜๋จธ์ง€ ์ „๋ฌธ๊ฐ€๋“ค์ด ํ•™์Šต๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋ถ€ํ•˜ ๋ถ„์‚ฐ(Load Balancing) ๊ธฐ์ˆ ์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. +* **๋ฐฐํฌ ๋ณต์žก์„ฑ**: ์ „๋ฌธ๊ฐ€๋“ค์„ ์—ฌ๋Ÿฌ GPU์— ๋ถ„์‚ฐ ๋ฐฐ์น˜ํ•˜๊ณ  ๋™๊ธฐํ™”ํ•˜๋Š” ๊ณผ์ •์ด ์ผ๋ฐ˜ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ๊นŒ๋‹ค๋กญ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **๊ธฐ๋ฐ˜ ๊ตฌ์กฐ**: [[Transformer Architecture|Transformer Architecture]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Routing Mechanism|Routing Mechanism]], [[Sparse Attention|Sparse Attention]] +* **๊ฒฝ์Ÿ ๊ตฌ์กฐ**: Dense Models (Llama 3 ๋“ฑ) + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Model Compression & Quantization.md b/10_Wiki/Topics/AI_and_ML/Model Compression & Quantization.md new file mode 100644 index 00000000..1276c4db --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Model Compression & Quantization.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-MCOQ-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, quantization, compression, fp8, int4, awq, gptq, gguf] +last_reinforced: 2026-05-04 +--- + +# [[Model Compression & Quantization|Model Compression & Quantization]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ๊ณ ๋†์ถ•: ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ˆซ์ž์˜ ์ •๋ฐ€๋„๋ฅผ ๋‚ฎ์ถ”์–ด(FP16 -> INT4), ์„ฑ๋Šฅ ์ €ํ•˜๋Š” ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ๋„ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ์—ฐ์‚ฐ ์†๋„๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๋Š” ํ•˜์ด์—”๋“œ ์ตœ์ ํ™” ๊ณต๋ฒ•." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ์ผ๋ฐ˜ ํ•˜๋“œ์›จ์–ด์—์„œ ๊ตฌ๋™ํ•˜๊ฑฐ๋‚˜ ์ถ”๋ก  ํšจ์œจ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ํ•ต์‹ฌ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **์–‘์žํ™” (Quantization)**: + * **์ •์˜**: ๊ฐ€์ค‘์น˜๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋น„ํŠธ ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. (์˜ˆ: 16๋น„ํŠธ ๋ถ€๋™์†Œ์ˆ˜์  $\rightarrow$ 4๋น„ํŠธ ์ •์ˆ˜) + * **ํšจ๊ณผ**: ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์•ฝ 4๋ฐฐ ๊ฐ์†Œํ•˜๋ฉฐ, ๋” ํฐ ๋ชจ๋ธ์„ ๋” ์ž‘์€ GPU์— ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. +2. **์ฃผ์š” ์ •๋ฐ€๋„ ํฌ๋งท**: + * **FP8**: ์ตœ์‹  H100/B200 GPU์—์„œ ์ง€์›ํ•˜๋ฉฐ, ์†๋„์™€ ์ •ํ™•๋„์˜ ์ตœ์  ๊ท ํ˜•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. + * **INT4/INT8**: ์ „ํ†ต์ ์ธ ์–‘์žํ™” ๋ฐฉ์‹์œผ๋กœ, ๋ชจ๋ฐ”์ผ์ด๋‚˜ ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค์—์„œ๋„ ๋„๋ฆฌ ์“ฐ์ž…๋‹ˆ๋‹ค. + * **NF4 (NormalFloat 4)**: QLoRA์—์„œ ์‚ฌ์šฉ๋˜๋Š” ํŠน์ˆ˜ ํฌ๋งท์œผ๋กœ, ๊ฐ€์ค‘์น˜ ๋ถ„ํฌ์— ์ตœ์ ํ™”๋œ ์–‘์žํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. +3. **๋Œ€ํ‘œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ & ํฌ๋งท**: + * **AWQ / GPTQ**: ์ถ”๋ก  ์†๋„์™€ ์ •ํ™•๋„๋ฅผ ๋ชจ๋‘ ์žก์€ ๋ฐ์ดํ„ฐ ์ธ์‹(Data-aware) ์–‘์žํ™” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. + * **GGUF / EXL2**: llama.cpp ๋“ฑ CPU๋‚˜ ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ LLM์„ ๊ตฌ๋™ํ•˜๊ธฐ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํฌ๋งท์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ •ํ™•๋„ ํ•˜๋ฝ (Precision Loss)**: ๋น„ํŠธ ์ˆ˜๋ฅผ ๋„ˆ๋ฌด ๊ณผํ•˜๊ฒŒ ์ค„์ด๋ฉด ๋ชจ๋ธ์˜ ๋…ผ๋ฆฌ ์ „๊ฐœ ๋Šฅ๋ ฅ์ด ๋–จ์–ด์ง€๊ฑฐ๋‚˜ ํ™˜๊ฐ์ด ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (ํŠนํžˆ 3๋น„ํŠธ ์ดํ•˜์—์„œ ๋‘๋“œ๋Ÿฌ์ง) +* **ํ•˜๋“œ์›จ์–ด ํ˜ธํ™˜์„ฑ**: FP8๊ณผ ๊ฐ™์€ ์ตœ์‹  ํฌ๋งท์€ ๊ตฌํ˜• GPU(RTX 30 ์‹œ๋ฆฌ์ฆˆ ์ดํ•˜)์—์„œ๋Š” ๊ฐ€์† ํšจ๊ณผ๊ฐ€ ๋ฏธ๋ฏธํ•˜๊ฑฐ๋‚˜ ์ž‘๋™ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[LLM Inference Optimization|LLM Inference Optimization]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[PEFT & LoRA|PEFT & LoRA]] (QLoRA), [[Deployment Frameworks|Deployment Frameworks]] +* **์ฃผ์š” ํˆด**: bitsandbytes, AutoAWQ, llama.cpp + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Model Context Protocol (MCP).md b/10_Wiki/Topics/AI_and_ML/Model Context Protocol (MCP).md new file mode 100644 index 00000000..f255022e --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Model Context Protocol (MCP).md @@ -0,0 +1,39 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-MCPR-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, mcp, model-context-protocol, anthropic, standardization, tool-integration] +last_reinforced: 2026-05-04 +--- + +# [[Model Context Protocol (MCP)|Model Context Protocol (MCP)]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "AI ์‹œ๋Œ€์˜ USB ํ‘œ์ค€: ํŒŒํŽธํ™”๋œ ์ˆ˜๋งŽ์€ ์•ฑ๊ณผ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋“ค์„ ๋ชจ๋ธ๊ณผ ์—ฐ๊ฒฐํ•˜๋Š” ๋‹จ์ผ ๊ทœ๊ฒฉ์„ ์ œ์‹œํ•จ์œผ๋กœ์จ, ๋ณต์žกํ•œ ์ปค์Šคํ…€ ๊ฐœ๋ฐœ ์—†์ด๋„ ์–ด๋–ค ๋„๊ตฌ๋“  ์ฆ‰์‹œ ์—์ด์ „ํŠธ์— ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“  ์ƒํƒœ๊ณ„์˜ ๊ต๋Ÿ‰." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +Model Context Protocol(MCP)์€ AI ์—์ด์ „ํŠธ๊ฐ€ ๋‹ค์–‘ํ•œ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ ์†Œ์Šค ๋ฐ ๋„๊ตฌ์™€ ํ†ต์‹ ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐœ๋ฐฉํ˜• ํ‘œ์ค€ ํ”„๋กœํ† ์ฝœ์ž…๋‹ˆ๋‹ค. + +1. **๋“ฑ์žฅ ๋ฐฐ๊ฒฝ**: + * ๊ธฐ์กด์—๋Š” ๊ฐ ์•ฑ(Slack, Google Drive, GitHub ๋“ฑ)๋งˆ๋‹ค ๋ณ„๋„์˜ API ์—ฐ๋™ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. + * MCP๋Š” ์ด๋Ÿฌํ•œ 'ํŒŒํŽธํ™”'๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ชจ๋“  ๋„๊ตฌ๊ฐ€ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ž์‹ ์˜ ๊ธฐ๋Šฅ์„ ๋ชจ๋ธ์—๊ฒŒ ๋…ธ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ํ‘œ์ค€์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. +2. **ํ•ต์‹ฌ ์•„ํ‚คํ…์ฒ˜**: + * **MCP Server**: ๋ฐ์ดํ„ฐ ์†Œ์Šค๋‚˜ ๋„๊ตฌ๋ฅผ MCP ๊ทœ๊ฒฉ์— ๋งž๊ฒŒ ๋…ธ์ถœํ•˜๋Š” ์„œ๋ฒ„. + * **MCP Client**: ์—์ด์ „ํŠธ(์˜ˆ: Claude Desktop, Antigravity Astra)๊ฐ€ ์„œ๋ฒ„์— ์—ฐ๊ฒฐํ•˜์—ฌ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. + * **Standardization**: USB-C ํ‘œ์ค€์ฒ˜๋Ÿผ, ํ•œ๋ฒˆ MCP ์„œ๋ฒ„๋ฅผ ๊ตฌ์ถ•ํ•˜๋ฉด ๋ชจ๋“  MCP ์ง€์› ํด๋ผ์ด์–ธํŠธ์—์„œ ์ฆ‰์‹œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. +3. **์ฃผ์š” ์ด์ **: + * **๊ฐœ๋ฐœ ์ƒ์‚ฐ์„ฑ**: ๋ณต์žกํ•œ ํ†ตํ•ฉ ์ฝ”๋“œ ์ž‘์„ฑ ์—†์ด ํ‘œ์ค€ ์„œ๋ฒ„๋งŒ ์—ฐ๊ฒฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. + * **๋ณด์•ˆ**: ๋ฐ์ดํ„ฐ์— ์ง์ ‘ ์ ‘๊ทผํ•˜๋Š” ๋Œ€์‹  ํ‘œ์ค€ ํ”„๋กœํ† ์ฝœ์„ ํ†ตํ•ด ์ œ์–ด๋œ ๋ฐฉ์‹์œผ๋กœ ์ •๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ›์Šต๋‹ˆ๋‹ค. + * **ํ™•์žฅ์„ฑ**: ์˜คํ”ˆ ํ‘œ์ค€(Linux Foundation ๊ธฐ์ฆ)์œผ๋กœ์„œ ์ˆ˜๋งŽ์€ ์จ๋“œํŒŒํ‹ฐ ๋„๊ตฌ๋“ค์ด MCP ์ƒํƒœ๊ณ„๋กœ ๋น ๋ฅด๊ฒŒ ํŽธ์ž…๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ดˆ๊ธฐ ์˜ค๋ฒ„ํ—ค๋“œ**: ๊ธฐ์กด ๋ ˆ๊ฑฐ์‹œ ์‹œ์Šคํ…œ์„ MCP ๊ทœ๊ฒฉ์— ๋งž๊ฒŒ ๋ž˜ํ•‘(Wrapping)ํ•˜๋Š” ์„œ๋ฒ„ ๊ฐœ๋ฐœ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. +* **์ง€์—ฐ ์‹œ๊ฐ„**: ํ”„๋กœํ† ์ฝœ ๊ณ„์ธต์ด ํ•˜๋‚˜ ๋” ์ถ”๊ฐ€๋˜๋ฏ€๋กœ, ์•„์ฃผ ๋ฏธ์„ธํ•œ ์ง€์—ฐ ์‹œ๊ฐ„์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Autonomous Agents & Workflows|Autonomous Agents & Workflows]], [[Tool Use & Function Calling|Tool Use & Function Calling]] +* **์—ฐ๊ด€ ์ง€ํ‘œ**: [[MCP-Atlas|MCP-Atlas]] (MCP ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ) +* **๊ด€๋ จ ๋ชจ๋ธ**: Claude (MCP์˜ ์ตœ์ดˆ ์ œ์•ˆ ๋ฐ ์„ ๋„์  ์ ์šฉ) + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/PEFT & LoRA.md b/10_Wiki/Topics/AI_and_ML/PEFT & LoRA.md new file mode 100644 index 00000000..030a376c --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/PEFT & LoRA.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-PELR-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, peft, lora, qlora, fine-tuning-optimization, vram-efficiency] +last_reinforced: 2026-05-04 +--- + +# [[PEFT & LoRA|PEFT & LoRA]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ตœ์†Œํ•œ์˜ ๋ณ€๊ฒฝ์œผ๋กœ ์ตœ๋Œ€์˜ ํšจ๊ณผ: ๊ฑฐ๋Œ€ ๋ชจ๋ธ์˜ ์ˆ˜์‹ญ์–ต ๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ „๋ถ€ ๊ฑด๋“œ๋ฆฌ๋Š” ๋Œ€์‹ , ์•„์ฃผ ์ž‘์€ ์–ด๋Œ‘ํ„ฐ(Adapter)๋งŒ ํ•™์Šต์‹œ์ผœ ๊ฐœ์ธ์šฉ PC์—์„œ๋„ ์ตœ์‹  AI๋ฅผ ํŠœ๋‹ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“  ํšจ์œจ์„ฑ์˜ ๊ทน์น˜." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +PEFT(Parameter-Efficient Fine-Tuning)๋Š” ๋ชจ๋ธ์˜ ์ „์ฒด ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์ง€ ์•Š๊ณ  ๊ทนํžˆ ์ผ๋ถ€์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฏธ์„ธ ์กฐ์ • ๊ธฐ์ˆ ์˜ ์ด์นญ์ž…๋‹ˆ๋‹ค. + +1. **LoRA (Low-Rank Adaptation)**: + * **์›๋ฆฌ**: ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ($W$)์„ ๊ทธ๋Œ€๋กœ ๋‘๋Š” ๋Œ€์‹ , ๋‘ ๊ฐœ์˜ ์ž‘์€ ์ €์ฐจ์› ํ–‰๋ ฌ($A, B$)์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„๋˜๋Š” ๋ณ€ํ™”๋Ÿ‰($\Delta W$)๋งŒ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. + * **์žฅ์ **: ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ 10,000๋ฐฐ ์ด์ƒ ์ค„์ด๋ฉด์„œ๋„ ์ „์ฒด ๊ฐ€์ค‘์น˜ ๋ฏธ์„ธ ์กฐ์ •๊ณผ ๋Œ€๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ƒ…๋‹ˆ๋‹ค. ํ•™์Šต ํ›„ ๊ธฐ์กด ๋ชจ๋ธ์— ์‰ฝ๊ฒŒ ๋ณ‘ํ•ฉ(Merge)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +2. **QLoRA (Quantized LoRA)**: + * **์›๋ฆฌ**: ๊ธฐ๋ณธ ๋ชจ๋ธ์„ 4๋น„ํŠธ๋กœ ์–‘์žํ™”(Quantization)ํ•˜์—ฌ VRAM์— ์˜ฌ๋ฆฌ๊ณ , ๊ทธ ์œ„์— LoRA๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. + * **์˜์˜**: ๋‹จ์ผ 24GB GPU(RTX 3090/4090)์—์„œ๋„ 65B(650์–ต ๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ) ์ด์ƒ์˜ ๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ ํ˜์‹ ์  ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. +3. **๊ธฐํƒ€ PEFT ๊ธฐ๋ฒ•**: + * **Prefix Tuning**: ์ž…๋ ฅ ์•ž์— ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ƒ ํ† ํฐ(Prefix)์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. + * **Prompt Tuning**: ํ”„๋กฌํ”„ํŠธ์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„ ์ผ๋ถ€๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. + * **Adapter Tuning**: ๊ธฐ์กด ํŠธ๋žœ์Šคํฌ๋จธ ๋ ˆ์ด์–ด ์‚ฌ์ด์— ์ž‘์€ ๋ณ‘๋ชฉ ๋ ˆ์ด์–ด๋ฅผ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ถ”๋ก  ์ง€์—ฐ**: ์–ด๋Œ‘ํ„ฐ(Adapter) ๋ฐฉ์‹์˜ ๊ฒฝ์šฐ ์ถ”๋ก  ์‹œ ์ถ”๊ฐ€ ์—ฐ์‚ฐ์ด ํ•„์š”ํ•˜์—ฌ ์†๋„๊ฐ€ ์†Œํญ ๋А๋ ค์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (LoRA๋Š” ๋ณ‘ํ•ฉ์„ ํ†ตํ•ด ํ•ด๊ฒฐ ๊ฐ€๋Šฅ). +* **๋ณต์žกํ•œ ์ž‘์—…์˜ ํ•œ๊ณ„**: ์•„์ฃผ ๋ฐฉ๋Œ€ํ•˜๊ฑฐ๋‚˜ ๋ณต์žกํ•œ ์ง€์‹์„ ์ƒˆ๋กญ๊ฒŒ ์ฃผ์ž…ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ, ์ „์ฒด ๊ฐ€์ค‘์น˜ ๋ฏธ์„ธ ์กฐ์ •(Full Fine-Tuning)์— ๋น„ํ•ด ์„ฑ๋Šฅ์ด ๋‹ค์†Œ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Fine-Tuning & Alignment|Fine-Tuning & Alignment]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Quantization|Quantization]], [[LLM Architecture|LLM Architecture]] +* **์ฃผ์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**: Hugging Face PEFT, Unsloth, Axolotl + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Personal Knowledge Management (PKM) & AI.md b/10_Wiki/Topics/AI_and_ML/Personal Knowledge Management (PKM) & AI.md new file mode 100644 index 00000000..fc4824f9 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Personal Knowledge Management (PKM) & AI.md @@ -0,0 +1,37 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-PKMA-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, pkm, second-brain, obsidian, knowledge-management, ai-assistant] +last_reinforced: 2026-05-04 +--- + +# [[Personal Knowledge Management (PKM) & AI|Personal Knowledge Management (PKM) & AI]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€์‹์˜ ์ฆํญ๊ธฐ: ๊ฐœ์ธ์˜ ํŒŒํŽธํ™”๋œ ์ƒ๊ฐ๊ณผ ๊ธฐ๋ก๋“ค์„ AI๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๋””์ง€ํ„ธ ์ •์›(Obsidian ๋“ฑ)์œผ๋กœ ๊ฐ€๊พธ๊ณ , ์ด๋ฅผ ๋กœ์ปฌ LLM๊ณผ ์—ฐ๊ฒฐํ•˜์—ฌ ๋‚˜๋ณด๋‹ค ๋‚˜๋ฅผ ๋” ์ž˜ ์•„๋Š” '๋‘ ๋ฒˆ์งธ ๋‡Œ'๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ธฐ์ˆ ์  ์‹ค์ฒœ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +PKM(๊ฐœ์ธ ์ง€์‹ ๊ด€๋ฆฌ)์€ AI ์‹œ๋Œ€์— ์ ‘์–ด๋“ค๋ฉฐ ๋‹จ์ˆœํ•œ ๊ธฐ๋ก ๋ณด๊ด€์„ ๋„˜์–ด, AI ์—์ด์ „ํŠธ์˜ ํ•ต์‹ฌ ์ปจํ…์ŠคํŠธ ์ €์žฅ์†Œ๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. + +1. **Obsidian (์˜ต์‹œ๋””์–ธ)**: + * **ํŠน์ง•**: ๋งˆํฌ๋‹ค์šด ๊ธฐ๋ฐ˜์˜ ๋กœ์ปฌ ์šฐ์„ (Local-first) ์ง€์‹ ๊ด€๋ฆฌ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ฃผ๊ถŒ(Data Sovereignty)์„ ์ง€ํ‚ค๋ฉด์„œ AI์™€ ์—ฐ๊ฒฐํ•˜๊ธฐ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํ”Œ๋žซํผ์ž…๋‹ˆ๋‹ค. + * **๊ฐ•์ **: ๋…ธํŠธ ๊ฐ„์˜ ์—ฐ๊ฒฐ์„ ์‹œ๊ฐํ™”ํ•˜๋Š” [[Knowledge Graph|Knowledge Graph]] ๊ธฐ๋Šฅ์„ ํ†ตํ•ด ์ •๋ณด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ•œ๋ˆˆ์— ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +2. **AI์™€์˜ ๊ฒฐํ•ฉ (Local RAG)**: + * **์›๋ฆฌ**: ์‚ฌ์šฉ์ž์˜ ์ „์ฒด ๋…ธํŠธ๋ฅผ [[Vector Database|Vector Database]]๋กœ ์ธ๋ฑ์‹ฑํ•˜๊ณ , [[Ollama]]์™€ ๊ฐ™์€ ๋กœ์ปฌ LLM์„ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. + * **์ด์ **: ๊ฐœ์ธ์˜ ๋ฏผ๊ฐํ•œ ์ง€์‹์ด ํด๋ผ์šฐ๋“œ๋กœ ์ „์†ก๋˜์ง€ ์•Š์œผ๋ฉด์„œ๋„, ๋‚ด ๋…ธํŠธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์š”์•ฝ, ๋‹ต๋ณ€, ์ฐฝ์˜์  ์˜๊ฐ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +3. **Andrej Karpathy์˜ "LLM Wiki" ํŒจํ„ด**: + * ์ธ๊ฐ„๊ณผ AI๊ฐ€ ๊ณต์ง„ํ™”ํ•˜๋ฉฐ ์ง€์‹์„ ์œ ์ง€๋ณด์ˆ˜ํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. + * `raw/`(์›๋ณธ), `wiki/`(์ •์ œ๋œ ์—”ํ‹ฐํ‹ฐ), `SCHEMA.md`(์ง€์‹ ์œ ์ง€ ๊ทœ์น™)๋กœ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ดˆ๊ธฐ ๊ตฌ์ถ• ์žฅ๋ฒฝ**: ๋กœ์ปฌ LLM ์—ฐ๋™, ํ”Œ๋Ÿฌ๊ทธ์ธ ์„ค์ • ๋“ฑ ๋น„๊ฐœ๋ฐœ์ž์—๊ฒŒ๋Š” ๊ธฐ์ˆ ์  ์ง„์ž… ์žฅ๋ฒฝ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. +* **ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ๋Ÿ‰**: ๋กœ์ปฌ์—์„œ ์ˆ˜๋งŒ ๊ฐœ์˜ ๋…ธํŠธ๋ฅผ ์ธ๋ฑ์‹ฑํ•˜๊ณ  LLM์„ ๋Œ๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ณ ์„ฑ๋Šฅ GPU(RTX 3060 ์ด์ƒ)๋‚˜ Apple Silicon(M2/M3)์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **ํ•ต์‹ฌ ๋„๊ตฌ**: [[Obsidian|Obsidian]], [[Ollama|Ollama]], [[Dataview|Dataview]] +* **๊ธฐ๋ฐ˜ ๊ธฐ์ˆ **: [[Retrieval-Augmented Generation (RAG)|Local RAG]], [[Knowledge Graph|Knowledge Graph]] +* **ํ‘œ์ค€ ํ”„๋กœํ† ์ฝœ**: [[Model Context Protocol (MCP)|Model Context Protocol (MCP)]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Positional Embeddings (RoPE & Variants).md b/10_Wiki/Topics/AI_and_ML/Positional Embeddings (RoPE & Variants).md new file mode 100644 index 00000000..676f9de5 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Positional Embeddings (RoPE & Variants).md @@ -0,0 +1,37 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-ROPE-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, rope, positional-embedding, yarn, longrope, context-extension] +last_reinforced: 2026-05-04 +--- + +# [[Positional Embeddings (RoPE & Variants)|Positional Embeddings (RoPE & Variants)]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ๋‚˜์นจ๋ฐ˜: ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ ์ƒ๋Œ€์  ๊ฑฐ๋ฆฌ๋ฅผ ํšŒ์ „(Rotation)์ด๋ผ๋Š” ์ˆ˜ํ•™์  ๊ธฐ๋ฒ•์œผ๋กœ ํ‘œํ˜„ํ•˜์—ฌ, ๋ชจ๋ธ์ด ํ•™์Šตํ•œ ๋ฒ”์œ„๋ฅผ ํ›จ์”ฌ ์ดˆ๊ณผํ•˜๋Š” ๊ธด ๋ฌธ์žฅ์—์„œ๋„ ๋‹จ์–ด์˜ ์ˆœ์„œ์™€ ๊ด€๊ณ„๋ฅผ ์ •ํ™•ํžˆ ํŒŒ์•…ํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ์œ„์น˜ ์ •๋ณด์˜ ํ˜๋ช…." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์œ„์น˜ ์ธ์ฝ”๋”ฉ(Positional Encoding)์€ ์ˆœ์„œ ๊ฐœ๋…์ด ์—†๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์—๊ฒŒ ํ† ํฐ์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **RoPE (Rotary Position Embedding)**: + * **์›๋ฆฌ**: ๊ฐ ํ† ํฐ์˜ ์œ„์น˜๋ฅผ ๋ณต์†Œ์ˆ˜ ํ‰๋ฉด์—์„œ์˜ ํšŒ์ „ ๊ฐ๋„๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ž…๋ ฅ๊ฐ’์— ๊ณฑํ•ด์ค๋‹ˆ๋‹ค. + * **ํŠน์ง•**: ์ ˆ๋Œ€์ ์ธ ์œ„์น˜๊ฐ€ ์•„๋‹Œ '์ƒ๋Œ€์ ์ธ ๊ฑฐ๋ฆฌ'๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐ˜์˜ํ•˜๋ฉฐ, ๊ธด ๋ฌธ๋งฅ์—์„œ๋„ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์ ์–ด Llama, PaLM ๋“ฑ ๋Œ€๋ถ€๋ถ„์˜ ์ตœ์‹  ๋ชจ๋ธ์—์„œ ํ‘œ์ค€์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. +2. **์ปจํ…์ŠคํŠธ ํ™•์žฅ ๊ธฐ์ˆ  (Variants)**: + * **Linear Interpolation**: ํ•™์Šต๋œ ๋ฒ”์œ„๋ฅผ ๋„˜์–ด์„œ๋Š” ์œ„์น˜๋ฅผ ๊ธฐ์กด ๋ฒ”์œ„ ๋‚ด๋กœ ์„ ํ˜• ์••์ถ•ํ•˜์—ฌ ์ธ์‹์‹œํ‚ต๋‹ˆ๋‹ค. + * **YaRN (Yet another RoPE extension method)**: ์„œ๋กœ ๋‹ค๋ฅธ ์ฃผํŒŒ์ˆ˜๋ฅผ ๊ฐ€์ง„ ํŒŒํ˜•๋“ค์„ ๊ฐ๊ธฐ ๋‹ค๋ฅด๊ฒŒ ์กฐ์ •ํ•˜์—ฌ, ์ •ํ™•๋„ ์†์‹ค ์—†์ด ์ปจํ…์ŠคํŠธ ์ฐฝ์„ ์ˆ˜์‹ญ ๋ฐฐ ์ด์ƒ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. + * **LongRoPE**: ์ง„ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์ˆ˜๋ฐฑ๋งŒ ํ† ํฐ ์ด์ƒ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ ์˜ ํšŒ์ „ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์•„๋ƒ…๋‹ˆ๋‹ค. +3. **iRoPE (Interleaved RoPE)**: + * ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์ด๋‚˜ ๊ธด ๋ฌธ๋งฅ ๋ชจ๋ธ์—์„œ ํŠน์ • ๋ ˆ์ด์–ด๋งˆ๋‹ค ์œ„์น˜ ์ •๋ณด๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ฃผ์ž…ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์™ธ์‚ฝ(Extrapolation)์˜ ํ•œ๊ณ„**: ํ•™์Šต ์‹œ ๋ณด์ง€ ๋ชปํ•œ ์•„์ฃผ ๋จผ ๊ฑฐ๋ฆฌ์˜ ํ† ํฐ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์ˆ˜ํ•™์ ์œผ๋กœ ๋„์ „์ ์ธ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. +* **๋ฏธ์„ธ ์กฐ์ • ํ•„์ˆ˜**: ๋‹จ์ˆœํžˆ RoPE ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•˜๋ฉฐ, ํ™•์žฅ๋œ ์ปจํ…์ŠคํŠธ ๋ฒ”์œ„์—์„œ ์†Œ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋กœ ์ถ”๊ฐ€ ํ•™์Šต(Fine-tuning)์„ ์ง„ํ–‰ํ•ด์•ผ ์ œ ์„ฑ๋Šฅ์ด ๋‚˜์˜ต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Transformer Architecture|Transformer Architecture]] +* **ํ•˜์œ„ ๊ธฐ์ˆ **: [[Attention Mechanisms|Attention Mechanisms]] +* **ํ•ด๊ฒฐ ๊ณผ์ œ**: [[Context Window & Long-Context LLMs|Context Window & Long-Context LLMs]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Reasoning Models.md b/10_Wiki/Topics/AI_and_ML/Reasoning Models.md new file mode 100644 index 00000000..e5ddfdb5 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Reasoning Models.md @@ -0,0 +1,37 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-RSNM-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, reasoning-models, deepseek-r1, cot, lrm, inference-time-compute] +last_reinforced: 2026-05-04 +--- + +# [[Reasoning Models|Reasoning Models]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์‹ฌ์‚ฌ์ˆ™๊ณ ํ•˜๋Š” ์ง€๋Šฅ: ์งˆ๋ฌธ์„ ๋ฐ›์ž๋งˆ์ž ๋‹ต์„ ๋ฑ‰๋Š” ๋ณธ๋Šฅ์  ๋ฐ˜์‘์„ ๋„˜์–ด, ๋‚ด๋ถ€์ ์œผ๋กœ ๋‹จ๊ณ„๋ณ„ ์‚ฌ๊ณ  ์‚ฌ์Šฌ(CoT)์„ ์ƒ์„ฑํ•˜๋ฉฐ ์Šค์Šค๋กœ ๋…ผ๋ฆฌ์  ๊ฒฐํ•จ์„ ์ ๊ฒ€ํ•˜๊ณ  ์ตœ์„ ์˜ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์•„๋‚ด๋Š” '์‹œ์Šคํ…œ 2(System 2)'์  AI." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +์ถ”๋ก  ๋ชจ๋ธ(Reasoning Models)์€ ๋ณต์žกํ•œ ์ˆ˜ํ•™, ์ฝ”๋”ฉ, ๋…ผ๋ฆฌ ํผ์ฆ ๋“ฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ๋ชจ๋ธ๋กœ, ๋‹ต๋ณ€ ์ƒ์„ฑ ์ „ ์ถฉ๋ถ„ํ•œ '์ƒ๊ฐ์˜ ์‹œ๊ฐ„(Inference-time Compute)'์„ ๊ฐ–๋Š” ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค. + +1. **DeepSeek-R1 & LRM (Large Reasoning Models)**: + * **ํ•ต์‹ฌ**: ๊ฐ•ํ™”ํ•™์Šต(RL)์„ ํ†ตํ•ด ๋ชจ๋ธ์ด ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ๊ณ  ์‚ฌ์Šฌ(Chain-of-Thought)์„ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. + * **์‚ฌ๊ณ  ์œ ํ˜• ๋ถ„ํ•ด**: ๋ชจ๋ธ์˜ ์ƒ๊ฐ ๊ณผ์ •์€ ์ฃผ๋กœ [์ถ”๋ก (Reasoning)], [์‹คํ–‰(Execution)], [์ „ํ™˜(Transition)]์˜ ์„ธ ๊ฐ€์ง€ ๋…ผ๋ฆฌ์  ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. +2. **์ž‘๋™ ์›๋ฆฌ**: + * **Inference-time Compute**: ๋” ๋งŽ์€ ์—ฐ์‚ฐ ์ž์›์„ ์ถ”๋ก  ๋‹จ๊ณ„์— ํ• ๋‹นํ•˜์—ฌ ๋‹ต๋ณ€์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค. (OpenAI o1, DeepSeek-R1 ๋“ฑ) + * **Self-Correction**: ์ƒ๊ฐํ•˜๋Š” ๊ณผ์ •์—์„œ ์ž์‹ ์˜ ์˜ค๋ฅ˜๋ฅผ ๋ฐœ๊ฒฌํ•˜๋ฉด "Wait, let me re-check..."์™€ ๊ฐ™์ด ์Šค์Šค๋กœ ๊ต์ •ํ•˜๋ฉฐ ๋…ผ๋ฆฌ๋ฅผ ์ „๊ฐœํ•ฉ๋‹ˆ๋‹ค. +3. **์„ฑ๊ณผ**: + * ์ˆ˜ํ•™(AIME), ์ฝ”๋”ฉ(Codeforces) ๋“ฑ ๊ณ ์ฐจ์›์ ์ธ ์ง€์  ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ผ๋ฐ˜ LLM์„ ์••๋„ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ง€์—ฐ ์‹œ๊ฐ„ (Latency)**: ์ตœ์ข… ๋‹ต๋ณ€์„ ๋‚ด๋†“๊ธฐ๊นŒ์ง€ ์ˆ˜์ฒœ~์ˆ˜๋งŒ ํ† ํฐ์˜ ๋‚ด๋ถ€ ์‚ฌ๊ณ ๋ฅผ ๊ฑฐ์ณ์•ผ ํ•˜๋ฏ€๋กœ ์ผ๋ฐ˜ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ๋А๋ฆฝ๋‹ˆ๋‹ค. +* **VRAM ํญ๋ฐœ**: ๊ธด ์‚ฌ๊ณ  ์‚ฌ์Šฌ(CoT)์€ [[KV Cache|KV Cache]]๋ฅผ ๊ธ‰๊ฒฉํžˆ ์†Œ๋ชจํ•˜์—ฌ GPU ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ํ˜„์ƒ์„ ์ผ์œผํ‚ต๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ [[ThinKV|ThinKV]]์™€ ๊ฐ™์€ ํŠนํ™”๋œ ์บ์‹œ ๊ด€๋ฆฌ ๊ธฐ์ˆ ์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. +* **Overthinking**: ๋‹จ์ˆœํ•œ ์ธ์‚ฌ๋ง์ด๋‚˜ ๊ธฐ์ดˆ์ ์ธ ์ •๋ณด ๊ฒ€์ƒ‰์—๋„ ๋ฌด๊ฑฐ์šด ์ถ”๋ก  ๊ณผ์ •์„ ๊ฑฐ์น˜๋Š” '๊ณผ๋„ํ•œ ์ƒ๊ฐ'์œผ๋กœ ์ž์›์„ ๋‚ญ๋น„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[LLM Capabilities|LLM Capabilities]], [[Artificial General Intelligence (AGI)|AGI]] +* **๊ธฐ๋ฐ˜ ๊ธฐ์ˆ **: [[Chain-of-Thought (CoT)|Chain-of-Thought (CoT)]], [[Reinforcement Learning (RL)|RL]] +* **ํ•ด๊ฒฐ ๊ธฐ์ˆ **: [[KV Cache Compression|KV Cache Compression]], [[ThinKV|ThinKV]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Reranking & Hybrid Search.md b/10_Wiki/Topics/AI_and_ML/Reranking & Hybrid Search.md new file mode 100644 index 00000000..ee007e75 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Reranking & Hybrid Search.md @@ -0,0 +1,37 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-RRHS-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, reranking, hybrid-search, semantic-search, lexical-search, bm25] +last_reinforced: 2026-05-04 +--- + +# [[Reranking & Hybrid Search|Reranking & Hybrid Search]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๊ฒ€์ƒ‰์˜ ํ•„ํ„ฐ๋ง๊ณผ ์žฌ์กฐํ•ฉ: ๋‹จ์ˆœํ•œ ์˜๋ฏธ์  ์œ ์‚ฌ์„ฑ(Dense)๊ณผ ์ •ํ™•ํ•œ ํ‚ค์›Œ๋“œ ๋งค์นญ(Sparse)์„ ๊ฒฐํ•ฉํ•˜๊ณ , ํ›„๋ณด๊ตฐ์„ ๋‹ค์‹œ ํ•œ๋ฒˆ ์ •๋ฐ€ ๊ฒ€์‚ฌํ•˜์—ฌ ๋ชจ๋ธ์—๊ฒŒ ๊ฐ€์žฅ ์™„๋ฒฝํ•œ ๊ทผ๊ฑฐ๋ฅผ ์ œ๊ณตํ•˜๋Š” 2๋‹จ๊ณ„ ๊ฒ€์ฆ ์‹œ์Šคํ…œ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +RAG ์‹œ์Šคํ…œ์˜ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ์ด์ƒ์˜ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์„ ๊ฒฐํ•ฉํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ์žฌ์ •๋ ฌํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. + +1. **Hybrid Search (ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰)**: + * **Dense Retrieval (์ž„๋ฒ ๋”ฉ ๊ฒ€์ƒ‰)**: ๋ฌธ๋งฅ๊ณผ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•˜์—ฌ ์œ ์‚ฌํ•œ ์ •๋ณด๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. (์˜ˆ: "๊ธˆ์œต ์œ„๊ธฐ"์™€ "๊ฒฝ์ œ ๊ณตํ™ฉ") + * **Sparse Retrieval (ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰)**: BM25 ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •ํ™•ํ•œ ๋‹จ์–ด ๋งค์นญ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ: ์ œํ’ˆ๋ช…, ๊ณ ์œ  ๋ช…์‚ฌ ๊ฒ€์ƒ‰) + * **Reciprocal Rank Fusion (RRF)**: ๋‘ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์ˆœ์œ„๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ํ›„๋ณด๊ตฐ์„ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค. +2. **Reranking (์žฌ์ˆœ์œ„ํ™”)**: + * **ํ•„์š”์„ฑ**: 1์ฐจ ๊ฒ€์ƒ‰(Vector Search)์€ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ ์ค‘ ํ›„๋ณด๋ฅผ ๋นจ๋ฆฌ ์ฐพ๋Š” ๋ฐ ์ตœ์ ํ™”๋˜์–ด ์žˆ์–ด ์ •๋ฐ€๋„๊ฐ€ ๋‹ค์†Œ ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + * **์ž‘๋™**: 1์ฐจ ๊ฒ€์ƒ‰์œผ๋กœ ๋ฝ‘ํžŒ ์ˆ˜์‹ญ ๊ฐœ์˜ ํ›„๋ณด๊ตฐ์— ๋Œ€ํ•ด, ํ›จ์”ฌ ๋ฌด๊ฒ๊ณ  ์ •๋ฐ€ํ•œ Cross-Encoder ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์งˆ๋ฌธ๊ณผ์˜ ๊ด€๋ จ์„ฑ์„ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๊ณ  ์ˆœ์œ„๋ฅผ ์žฌ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค. +3. **ํšจ๊ณผ**: + * ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์ƒ์œ„๊ถŒ(Top-K)์— ์‹ค์ œ ์ •๋‹ต์ด ํฌํ•จ๋  ํ™•๋ฅ (Recall)๊ณผ ์ •๋‹ต๋งŒ ํฌํ•จ๋  ํ™•๋ฅ (Precision)์„ ๋™์‹œ์— ๋†’์ž…๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **์ง€์—ฐ ์‹œ๊ฐ„**: Reranking ๋‹จ๊ณ„๋Š” ์ถ”๊ฐ€์ ์ธ ๋ชจ๋ธ ์—ฐ์‚ฐ์„ ํ•„์š”๋กœ ํ•˜๋ฏ€๋กœ, ์ „์ฒด ์‘๋‹ต ์†๋„๊ฐ€ ์ˆ˜๋ฐฑ ๋ฐ€๋ฆฌ์ดˆ ์ด์ƒ ๋А๋ ค์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **๋น„์šฉ**: ๊ณ ์„ฑ๋Šฅ Reranker ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ API ํ˜ธ์ถœ ๋น„์šฉ์ด๋‚˜ GPU ์ž์› ์†Œ๋ชจ๊ฐ€ ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ์‹œ์Šคํ…œ**: [[Retrieval-Augmented Generation (RAG)|Retrieval-Augmented Generation (RAG)]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Vector Databases & Search|Vector Databases & Search]], [[Embedding Models & MRL|Embedding Models & MRL]] +* **์ฃผ์š” ํˆด**: Cohere Rerank, BGE-Reranker, Voyage Rerank + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Tokenization & Subword Processing.md b/10_Wiki/Topics/AI_and_ML/Tokenization & Subword Processing.md new file mode 100644 index 00000000..519df5f9 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Tokenization & Subword Processing.md @@ -0,0 +1,36 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-TKNP-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, tokenization, bpe, wordpiece, subword-tokenizer, nlp-preprocessing] +last_reinforced: 2026-05-04 +--- + +# [[Tokenization & Subword Processing|Tokenization & Subword Processing]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์–ธ์–ด์˜ ์›์žํ™”: ์ธ๊ฐ„์˜ ๋ฌธ์žฅ์„ ๋ชจ๋ธ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ˆซ์ž ์กฐ๊ฐ(Token)์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๊ณผ์ •์ด๋ฉฐ, ์ด ๋ถ„ํ•ด ๋ฐฉ์‹์˜ ํšจ์œจ์„ฑ์ด ๋ชจ๋ธ์˜ ์ง€๋Šฅ, ์†๋„, ๊ทธ๋ฆฌ๊ณ  ์šด์˜ ๋น„์šฉ์„ ๊ฒฐ์ •์ง“๋Š” AI์˜ ์ฒซ ๋ฒˆ์งธ ๊ด€๋ฌธ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +ํ† ํฐํ™”(Tokenization)๋Š” ํ…์ŠคํŠธ๋ฅผ ๋ชจ๋ธ์ด ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œ ๋‹จ์œ„์ธ ํ† ํฐ์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. + +1. **์ฃผ์š” ๋ฐฉ์‹**: + * **BPE (Byte-Pair Encoding)**: ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋“ฑ์žฅํ•˜๋Š” ๋ฌธ์ž ์Œ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ณ‘ํ•ฉํ•˜์—ฌ ํ† ํฐ ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. (GPT, Llama ๋“ฑ ํ‘œ์ค€) + * **WordPiece**: BPE์™€ ์œ ์‚ฌํ•˜๋‚˜, ๋ณ‘ํ•ฉ ์‹œ ์–ธ์–ด ๋ชจ๋ธ์˜ ์šฐ๋„(Likelihood) ์ฆ๊ฐ€๋Ÿ‰์„ ๊ธฐ์ค€์œผ๋กœ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. (BERT ๊ณ„์—ด) + * **SentencePiece**: ์‚ฌ์ „ ํ›ˆ๋ จ ์—†์ด ํ…์ŠคํŠธ ์ „๋ฐ˜์„ ๋ฐ”์ดํŠธ ์ŠคํŠธ๋ฆผ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋‹ค๊ตญ์–ด ๋ฐ ๋ฏธ๋“ฑ๋ก์–ด(OOV) ๋Œ€์‘์— ๊ฐ•์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. +2. **์˜๋ฏธ์  ๋‹จ์œ„**: + * ํ˜„๋Œ€ ํ† ํฌ๋‚˜์ด์ €๋Š” ๋‹จ์–ด ์ „์ฒด๊ฐ€ ์•„๋‹Œ 'ํ•˜์œ„ ๋‹จ์–ด(Subword)' ๋‹จ์œ„๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด "unhappiness"๋ฅผ "un", "happi", "ness"๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ ๋ถ€๋ถ„์˜ ์˜๋ฏธ๋ฅผ ์กฐํ•ฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. +3. **ํ† ํฐ ์‚ฌ์ „ ํฌ๊ธฐ (Vocab Size)**: + * ์‚ฌ์ „์ด ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋ฌธ์žฅ์ด ๋„ˆ๋ฌด ๋งŽ์€ ํ† ํฐ์œผ๋กœ ์ชผ๊ฐœ์ ธ ์—ฐ์‚ฐ ํšจ์œจ์ด ๋–จ์–ด์ง€๊ณ , ๋„ˆ๋ฌด ํฌ๋ฉด ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋‚ญ๋น„๋ฉ๋‹ˆ๋‹ค. ๋ณดํ†ต 32k ~ 128k ์‚ฌ์ด์—์„œ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋‹ค๊ตญ์–ด ๋ถˆ๊ท ํ˜•**: ์˜์–ด๋Š” ๋‹จ์–ด๋‹น ํ† ํฐ ์ˆ˜๊ฐ€ ์ ์ง€๋งŒ, ํ•œ๊ตญ์–ด๋‚˜ ๋‹ค๋ฅธ ์–ธ์–ด๋Š” ๋™์ผํ•œ ์˜๋ฏธ๋ผ๋„ ํ›จ์”ฌ ๋งŽ์€ ํ† ํฐ์œผ๋กœ ์ชผ๊ฐœ์ ธ ๋น„์šฉ์ด ๋น„์‹ธ๊ณ  ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +* **๋น„๊ฒฐ์ •๋ก ์  ์ด์Šˆ**: ํ† ํฌ๋‚˜์ด์ €์˜ ์‚ฌ์†Œํ•œ ์ฐจ์ด๊ฐ€ ๋ชจ๋ธ์˜ ์‚ฐ์ˆ  ์—ฐ์‚ฐ ๋Šฅ๋ ฅ์ด๋‚˜ ํŠน์ˆ˜ ๋ฌธ์ž ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Natural Language Processing (NLP)|NLP]], [[Transformer Architecture|Transformer Architecture]] +* **ํ•˜์œ„ ์‹œ์Šคํ…œ**: [[Tokenization Economics|Tokenization Economics]] +* **์—ฐ๊ด€ ๋ฌผ๋ฆฌ ์ œ์•ฝ**: [[Context Window & Long-Context LLMs|Context Window]], [[KV Cache|KV Cache]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Tokenization Economics.md b/10_Wiki/Topics/AI_and_ML/Tokenization Economics.md new file mode 100644 index 00000000..4a12e746 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Tokenization Economics.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-TKNE-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, token-economics, cost-optimization, inference-efficiency, throughput] +last_reinforced: 2026-05-04 +--- + +# [[Tokenization Economics|Tokenization Economics]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "ํ† ํฐ์ด ๊ณง ๋ˆ์ด๋‹ค: ๋ชจ๋ธ์˜ ์—ฐ์‚ฐ๋Ÿ‰, VRAM ์‚ฌ์šฉ๋Ÿ‰, API ๋น„์šฉ, ๊ทธ๋ฆฌ๊ณ  ์‘๋‹ต ์ง€์—ฐ ์‹œ๊ฐ„์ด ๋ชจ๋‘ 'ํ† ํฐ์˜ ๊ฐœ์ˆ˜'์— ์ •๋น„๋ก€ํ•˜๋ฏ€๋กœ, ํ† ํฐ ํšจ์œจ์„ฑ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ์ง€์† ๊ฐ€๋Šฅํ•œ AI ์„œ๋น„์Šค์˜ ํ•ต์‹ฌ ๊ฒฝ์ œํ•™." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +ํ† ํฐ ๊ฒฝ์ œํ•™(Token Economics)์€ ์‹œ์Šคํ…œ ๋ ˆ๋ฒจ์—์„œ ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰์„ ์ตœ์ ํ™”ํ•˜์—ฌ ํšจ์œจ๊ณผ ๋น„์šฉ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ์—”์ง€๋‹ˆ์–ด๋ง ์ „๋žต์ž…๋‹ˆ๋‹ค. + +1. **ํ† ํฌ๋‚˜์ด์ € ํŠธ๋ ˆ์ด๋“œ์˜คํ”„ ์‚ผ๊ฐํ˜• (Triangle)**: + * **Cost (๋น„์šฉ)**: ํ† ํฐ ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก API ๋น„์šฉ๊ณผ ์ธํ”„๋ผ ์œ ์ง€๋น„๊ฐ€ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. + * **Performance (์„ฑ๋Šฅ)**: ํ† ํฐ ์ˆ˜๊ฐ€ ๋งŽ์œผ๋ฉด ์ƒ์„ฑ ์ง€์—ฐ ์‹œ๊ฐ„(Latency)์ด ๋Š˜์–ด๋‚˜๊ณ  ์ฒ˜๋ฆฌ๋Ÿ‰(Throughput)์ด ์ค„์–ด๋“ญ๋‹ˆ๋‹ค. + * **Quality (ํ’ˆ์งˆ)**: ๋„ˆ๋ฌด ๊ณต๊ฒฉ์ ์œผ๋กœ ํ† ํฐ์„ ์••์ถ•ํ•˜๊ฑฐ๋‚˜ ์ค„์ด๋ฉด ๋ชจ๋ธ์˜ ์ดํ•ด๋„๋‚˜ ํ‘œํ˜„์˜ ์ •๋ฐ€๋„๊ฐ€ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. +2. **์ตœ์ ํ™” ์ „๋žต**: + * **Dynamic Allocation**: ๊ณ ์ •๋œ ๊ธธ์ด๋ฅผ ํ• ๋‹นํ•˜๋Š” ๋Œ€์‹ , ์‹ค์ œ ์ž…๋ ฅ์— ๋งž์ถฐ ์‹œํ€€์Šค ๊ธธ์ด๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ๋‚ญ๋น„๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค (์ตœ๋Œ€ 45% ์ ˆ๊ฐ). + * **Predictive Tokenization**: ์ž‘์—…์˜ ๋ณต์žก๋„๋ฅผ ์˜ˆ์ธกํ•˜์—ฌ ์ ์ ˆํ•œ ํ† ํฐ ์˜ˆ์‚ฐ์„ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. + * **Prefix Caching**: ๋ฐ˜๋ณต๋˜๋Š” ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ๋‚˜ ๋Œ€๊ทœ๋ชจ ๋ฌธ์„œ๋Š” ํ† ํฌ๋‚˜์ด์ง• ๊ฒฐ๊ณผ๋ฅผ ์บ์‹ฑํ•˜์—ฌ ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. +3. **๋ฐ์ดํ„ฐ ์—”ํŠธ๋กœํ”ผ ์ตœ์ ํ™”**: + * ๋ถˆํ•„์š”ํ•œ ๊ณต๋ฐฑ, ์ค‘๋ณต ์„œ์‹, ๋…ธ์ด์ฆˆ ํ…์ŠคํŠธ๋ฅผ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ ์ œ๊ฑฐํ•˜์—ฌ '์˜๋ฏธ๋‹น ํ† ํฐ ์ˆ˜'๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋‹ค๊ตญ์–ด ์ฒ˜๋ฆฌ ์˜ค๋ฒ„ํ—ค๋“œ**: ํŠน์ • ์–ธ์–ด(์˜ˆ: ํ…”๋ฃจ๊ตฌ์–ด)๋Š” ์˜์–ด๋ณด๋‹ค 7๋ฐฐ ์ด์ƒ์˜ ํ† ํฐ์„ ์†Œ๋ชจํ•  ์ˆ˜ ์žˆ์–ด, ๊ธ€๋กœ๋ฒŒ ์„œ๋น„์Šค ์„ค๊ณ„ ์‹œ ์˜ˆ๊ธฐ์น˜ ๋ชปํ•œ ๋น„์šฉ ํญ๋ฐœ์˜ ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. +* **์ค‘๋ณต์˜ ํ•จ์ •**: RAG์—์„œ ์ฒญํฌ ์ค‘์ฒฉ(Overlap)์„ ๊ณผํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ๋™์ผํ•œ ์ •๋ณด๊ฐ€ ์—ฌ๋Ÿฌ ๋ฒˆ ํ† ํฐํ™”๋˜์–ด VRAM์„ ๋‚ญ๋น„ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Tokenization & Subword Processing|Tokenization & Subword Processing]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Prefix Caching|Prefix Caching]], [[KV Cache Management|KV Cache Management]] +* **ํ•ด๊ฒฐ ๊ณผ์ œ**: [[LLM Inference Optimization|LLM Inference Optimization]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Tool Use & Function Calling.md b/10_Wiki/Topics/AI_and_ML/Tool Use & Function Calling.md new file mode 100644 index 00000000..3310dcbe --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Tool Use & Function Calling.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-TULC-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, tool-use, function-calling, api-integration, agent-action] +last_reinforced: 2026-05-04 +--- + +# [[Tool Use & Function Calling|Tool Use & Function Calling]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์ง€๋Šฅ์˜ ์†๊ณผ ๋ฐœ: ํ…์ŠคํŠธ ์ƒ์„ฑ์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, ์™ธ๋ถ€ API๋ฅผ ํ˜ธ์ถœํ•˜๊ฑฐ๋‚˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•จ์œผ๋กœ์จ ํ˜„์‹ค ์„ธ๊ณ„์— ์ง์ ‘์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์ •ํ™•ํ•œ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ์—์ด์ „ํŠธ์˜ ํ•ต์‹ฌ ์ธํ„ฐํŽ˜์ด์Šค." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๋„๊ตฌ ์‚ฌ์šฉ(Tool Use) ๋˜๋Š” ํ•จ์ˆ˜ ํ˜ธ์ถœ(Function Calling)์€ ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋Š” ์ž‘์—…์„ ์™ธ๋ถ€ ์‹œ์Šคํ…œ์— ์œ„์ž„ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. + +1. **์ž‘๋™ ์›๋ฆฌ**: + * **๋„๊ตฌ ์ •์˜ (Definition)**: ๋ชจ๋ธ์—๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋„๊ตฌ์˜ ์ด๋ฆ„, ์„ค๋ช…, ๋งค๊ฐœ๋ณ€์ˆ˜(Parameter) ์Šคํ‚ค๋งˆ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ๋ ค์ค๋‹ˆ๋‹ค. + * **ํ˜ธ์ถœ ๊ฒฐ์ • (Selection)**: ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ • ๋„๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด, ๋ชจ๋ธ์€ ๋‹ต๋ณ€ ๋Œ€์‹  ํ•ด๋‹น ๋„๊ตฌ๋ฅผ ํ˜ธ์ถœํ•˜๊ธฐ ์œ„ํ•œ JSON ํ˜•ํƒœ์˜ ๋ช…๋ น์–ด๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. + * **๊ฒฐ๊ณผ ๋ฐ˜์˜ (Integration)**: ์™ธ๋ถ€ ์‹œ์Šคํ…œ์—์„œ ์‹คํ–‰๋œ ๊ฒฐ๊ณผ(์˜ˆ: ๋‚ ์”จ ๋ฐ์ดํ„ฐ, DB ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ)๋ฅผ ๋‹ค์‹œ ๋ชจ๋ธ์—๊ฒŒ ์ž…๋ ฅํ•˜์—ฌ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. +2. **์ฃผ์š” ์‚ฌ๋ก€**: + * **Search**: ์ตœ์‹  ์ •๋ณด๋ฅผ ์œ„ํ•ด ์›น ๊ฒ€์ƒ‰ ๋„๊ตฌ ํ™œ์šฉ. + * **Calculator/Python**: ์ •ํ™•ํ•œ ์ˆ˜์น˜ ๊ณ„์‚ฐ์ด๋‚˜ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•ด ์ฝ”๋“œ ์‹คํ–‰๊ธฐ ํ™œ์šฉ. + * **Database**: ๊ธฐ์—… ๋‚ด ๋ฐ์ดํ„ฐ ์กฐํšŒ๋ฅผ ์œ„ํ•ด SQL ์ฟผ๋ฆฌ ์ƒ์„ฑ ๋ฐ ์‹คํ–‰. +3. **๋ฐœ์ „**: + * ์ตœ์‹  ๋ชจ๋ธ๋“ค์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋„๊ตฌ๋ฅผ ๋™์‹œ์— ํ˜ธ์ถœ(Parallel Tool Use)ํ•˜๊ฑฐ๋‚˜, ๋ณต์žกํ•œ ์ˆœ์„œ๋กœ ๋„๊ตฌ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋งค์šฐ ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋ณด์•ˆ ์œ„ํ—˜**: ๋ชจ๋ธ์ด ์•…์˜์ ์ธ ๋ช…๋ น์–ด๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์‹œ์Šคํ…œ์„ ํŒŒ๊ดดํ•˜๊ฑฐ๋‚˜ ์ค‘์š” ๋ฐ์ดํ„ฐ๋ฅผ ์œ ์ถœํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์‹คํ–‰ ํ™˜๊ฒฝ(Sandbox)์˜ ์—„๊ฒฉํ•œ ๊ฒฉ๋ฆฌ๊ฐ€ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. +* **ํ™˜๊ฐ (Hallucination)**: ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋„๊ตฌ๋ฅผ ๋ถ€๋ฅด๊ฑฐ๋‚˜, ๋„๊ตฌ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ž˜๋ชป ์ƒ์„ฑํ•˜๋Š” ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Autonomous Agents & Workflows|Autonomous Agents & Workflows]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Model Context Protocol (MCP)|Model Context Protocol (MCP)]], [[API Design|API Design]] +* **ํ•ด๊ฒฐ ๊ธฐ์ˆ **: [[Execution Environment (Sandbox)|Execution Environment (Sandbox)]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Transformer Architecture.md b/10_Wiki/Topics/AI_and_ML/Transformer Architecture.md new file mode 100644 index 00000000..800624a9 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Transformer Architecture.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-TRFA-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, transformer, attention-mechanism, mha, mla, self-attention, deep-learning] +last_reinforced: 2026-05-04 +--- + +# [[Transformer Architecture|Transformer Architecture]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "ํ˜„๋Œ€ AI์˜ ํ‘œ์ค€ ์„ค๊ณ„๋„: ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ๋™์‹œ์— ํŒŒ์•…ํ•˜๋Š” ์–ดํ…์…˜(Attention) ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ•ต์‹ฌ์œผ๋กœ ํ•˜์—ฌ, ๋ฌธ์žฅ์˜ ์ˆœ์ฐจ์  ์ฒ˜๋ฆฌ๋ฅผ ํƒˆํ”ผํ•˜๊ณ  ๋ณ‘๋ ฌ ์—ฐ์‚ฐ์˜ ์‹œ๋Œ€๋ฅผ ์—ด์–ด์ –ํžŒ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ทผ๊ฐ„ ์•„ํ‚คํ…์ฒ˜." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +ํŠธ๋žœ์Šคํฌ๋จธ๋Š” 2017๋…„ "Attention Is All You Need" ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ์ดํ›„, ๊ฑฐ์˜ ๋ชจ๋“  ํ˜„๋Œ€ ์ƒ์„ฑํ˜• AI์˜ ๊ธฐ๋ฐ˜์ด ๋œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. + +1. **ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ**: + * **Self-Attention (์ž๊ธฐ ์ฃผ์˜ ์ง‘์ค‘)**: ๋ฌธ์žฅ ๋‚ด์˜ ๊ฐ ๋‹จ์–ด๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋“  ๋‹จ์–ด๋“ค๊ณผ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฌธ๋งฅ์„ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค. + * **Multi-Head Attention (MHA)**: ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์–ดํ…์…˜์„ ๋ณ‘๋ ฌ๋กœ ์ˆ˜ํ–‰ํ•˜์—ฌ, ๋‹จ์–ด ๊ฐ„์˜ ๋‹ค์–‘ํ•œ ๊ด€๊ณ„(๋ฌธ๋ฒ•์ , ์˜๋ฏธ์  ๋“ฑ)๋ฅผ ๋™์‹œ์— ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. + * **Feed-Forward Network (FFN)**: ์–ดํ…์…˜ ๊ฒฐ๊ณผ๋ฌผ์„ ๋น„์„ ํ˜• ๋ณ€ํ™˜ํ•˜์—ฌ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. + * **Positional Encoding**: ์ˆœ์„œ ์ •๋ณด๊ฐ€ ์—†๋Š” ์–ดํ…์…˜์— ๋‹จ์–ด์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ฃผ์ž…ํ•ฉ๋‹ˆ๋‹ค. ([[Positional Embeddings (RoPE & Variants)|RoPE]] ๋“ฑ ํ™œ์šฉ) +2. **์ง„ํ™”๋œ ์–ดํ…์…˜ - MLA (Multi-Head Latent Attention)**: + * **ํŠน์ง•**: Key์™€ Value๋ฅผ ์••์ถ•๋œ ์ž ์žฌ ๊ณต๊ฐ„(Latent Space)์œผ๋กœ ํˆฌ์˜ํ•˜์—ฌ [[KV Cache|KV Cache]] ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค. + * **์˜์˜**: ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ๋„ ๋ฐฑ๋งŒ ํ† ํฐ ์ด์ƒ์˜ ์ดˆ์žฅ๊ฑฐ๋ฆฌ ๋ฌธ๋งฅ ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. (DeepSeek ๋“ฑ ์ตœ์‹  ๋ชจ๋ธ ์ ์šฉ) +3. **๋ณ‘๋ ฌ ์—ฐ์‚ฐ์˜ ์ด์ **: + * ์ด์ „์˜ RNN ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ ๋ฌธ์žฅ์„ ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์–ด, GPU๋ฅผ ํ™œ์šฉํ•œ ๋Œ€๊ทœ๋ชจ ํ•™์Šต์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. + +## โš–๏ธ Trade-offs & Caveats +* **๋ฉ”๋ชจ๋ฆฌ ํญ๋ฐœ**: ์–ดํ…์…˜ ์—ฐ์‚ฐ์€ ์ž…๋ ฅ ๊ธธ์ด์— ๋น„๋ก€ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰์ด ์ œ๊ณฑ($O(n^2)$)์œผ๋กœ ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค. +* **MLA์˜ ์™œ๊ณก**: MLA์™€ ๊ฐ™์€ ์••์ถ• ๊ธฐ๋ฒ•์€ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์ ˆ์•ฝํ•˜์ง€๋งŒ, ๋ฌธ๋งฅ์ด ๊ทน๋„๋กœ ๊ธธ์–ด์งˆ ๊ฒฝ์šฐ ์ •๋ณด์˜ ๋ฏธ์„ธํ•œ ์™œ๊ณก์ด ๋ฐœ์ƒํ•˜์—ฌ ๋‹ค์ค‘ ์ •๋ณด ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **์ƒ์œ„ ๊ฐœ๋…**: [[Deep Learning|Deep Learning]], [[Natural Language Processing (NLP)|NLP]] +* **์ตœ์ ํ™” ๊ธฐ์ˆ **: [[Attention Mechanisms|Attention Mechanisms]], [[Flash Attention|Flash Attention]], [[Mixture of Experts (MoE) & Sparse Architectures|MoE]] +* **์œ„์น˜ ์ •๋ณด**: [[Positional Embeddings (RoPE & Variants)|Positional Embeddings]] + +--- +*Last updated: 2026-05-04* diff --git a/10_Wiki/Topics/AI_and_ML/Vector Databases & Search.md b/10_Wiki/Topics/AI_and_ML/Vector Databases & Search.md new file mode 100644 index 00000000..856feb33 --- /dev/null +++ b/10_Wiki/Topics/AI_and_ML/Vector Databases & Search.md @@ -0,0 +1,38 @@ +--- +id: [[P-Reinforce|P-Reinforce]]-AUTO-VDBS-001 +category: Unified +confidence_score: 1.00 +tags: [auto-reinforced, vector-database, hnsw, indexing, semantic-search, similarity-search] +last_reinforced: 2026-05-04 +--- + +# [[Vector Databases & Search|Vector Databases & Search]] + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์˜๋ฏธ์˜ ๋„์„œ๊ด€: ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ์˜ค๋””์˜ค ๋“ฑ์˜ ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜ํ•™์  ์ขŒํ‘œ(Vector)๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์ˆ˜์–ต ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ค‘ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ์ •๋ณด๋ฅผ ์ˆ˜ ๋ฐ€๋ฆฌ์ดˆ ๋งŒ์— ์ฐพ์•„๋‚ด๋Š” ํ˜„๋Œ€ AI์˜ ๊ฑฐ๋Œ€ํ•œ ์ง€์‹ ์ €์žฅ์†Œ." + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋กœ ์ธ๋ฑ์‹ฑํ•˜์—ฌ ๋น ๋ฅธ ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰(Similarity Search)์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. + +1. **ํ•ต์‹ฌ ์ž‘๋™ ์›๋ฆฌ**: + * **์ž„๋ฒ ๋”ฉ ๋ณ€ํ™˜**: ๋ฐ์ดํ„ฐ๋ฅผ [[Embedding Models|Embedding Models]]๋ฅผ ํ†ตํ•ด ์ˆ˜์ฒœ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. + * **์ธ๋ฑ์‹ฑ (Indexing)**: ๊ฒ€์ƒ‰ ์†๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์กฐํ™”ํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ: [[HNSW]], IVF, PQ) + * **์œ ์‚ฌ๋„ ๊ณ„์‚ฐ**: ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„(Cosine Similarity)๋‚˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฒกํ„ฐ๋“ค์„ ์ฐพ์Šต๋‹ˆ๋‹ค. +2. **์ฃผ์š” ์ธ๋ฑ์‹ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜ - HNSW**: + * **๊ณ„์ธต์  ๊ทธ๋ž˜ํ”„**: ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์„ ๊ณ„์ธต์ ์ธ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋กœ ์—ฐ๊ฒฐํ•˜์—ฌ, '์ข์€ ์„ธ์ƒ(Small World)' ๋„คํŠธ์›Œํฌ ์›๋ฆฌ๋ฅผ ์ด์šฉํ•ด ๋น ๋ฅด๊ฒŒ ๋ชฉํ‘œ์— ๋„๋‹ฌํ•ฉ๋‹ˆ๋‹ค. + * **ํŠน์ง•**: ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์€ ๋งŽ์ง€๋งŒ ๊ฒ€์ƒ‰ ์†๋„์™€ ์ •ํ™•๋„๊ฐ€ ๋งค์šฐ ๋›ฐ์–ด๋‚˜ ๋Œ€๋ถ€๋ถ„์˜ ์ƒ์šฉ ๋ฒกํ„ฐ DB์˜ ํ‘œ์ค€์œผ๋กœ ์ž๋ฆฌ ์žก์•˜์Šต๋‹ˆ๋‹ค. +3. **๋Œ€ํ‘œ์  ์†”๋ฃจ์…˜**: + * **ํด๋ผ์šฐ๋“œ/๋งค๋‹ˆ์ง€๋“œ**: Pinecone, Weaviate, Qdrant. + * **์˜คํ”ˆ์†Œ์Šค/์„ค์น˜ํ˜•**: Milvus, ChromaDB, FAISS. + +## โš–๏ธ Trade-offs & Caveats +* **๋น„์šฉ๊ณผ ์ž์›**: ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ์™€ ์ธ๋ฑ์Šค๋ฅผ ๋ฉ”๋ชจ๋ฆฌ(RAM)์— ์˜ฌ๋ ค๋‘์–ด์•ผ ์„ฑ๋Šฅ์ด ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์ธํ”„๋ผ ๋น„์šฉ์ด ๋†’์Šต๋‹ˆ๋‹ค. +* **์ •ํ™•๋„์™€ ์†๋„์˜ ์ ˆ์ถฉ**: ์™„๋ฒฝํ•œ ๊ฒ€์ƒ‰(Exact Search) ๋Œ€์‹  ๊ทผ์‚ฌ ๊ฒ€์ƒ‰(ANN, Approximate Nearest Neighbor)์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ 100% ์žฌํ˜„์œจ์„ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +* **๊ธฐ๋ฐ˜ ๊ธฐ์ˆ **: [[Embedding Models & MRL|Embedding Models & MRL]], [[Chunking & Pre-processing|Chunking & Pre-processing]] +* **์‘์šฉ ๋ถ„์•ผ**: [[Retrieval-Augmented Generation (RAG)|RAG]], [[Agent Memory Systems|Agent Memory Systems]] +* **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Hybrid Search|Hybrid Search]], [[Quantization|Quantization]] + +--- +*Last updated: 2026-05-04*