From 2eb44231a546f6001f2ebcbfb2d5593a07431357 Mon Sep 17 00:00:00 2001 From: yesung Date: Mon, 20 Apr 2026 17:05:12 +0900 Subject: [PATCH] [P-Reinforce] Substantial content added to DQN/Dijkstra/Differentiable/Dense (Batch 05) --- 10_Wiki/Topics/AI/DQN.md | 27 +++++++++++++++++++ 10_Wiki/Topics/AI/Data Cleaning Algorithms.md | 27 +++++++++++++++++++ .../AI/Dense vs Sparse Neural Networks.md | 27 +++++++++++++++++++ .../Topics/AI/Differentiable Programming.md | 27 +++++++++++++++++++ 10_Wiki/Topics/AI/Dijkstra's Algorithm.md | 27 +++++++++++++++++++ .../AI/Distributed Reinforcement Learning.md | 27 +++++++++++++++++++ .../Topics/AI/Dynamic-Environment-Handling.md | 27 +++++++++++++++++++ 7 files changed, 189 insertions(+) create mode 100644 10_Wiki/Topics/AI/DQN.md create mode 100644 10_Wiki/Topics/AI/Data Cleaning Algorithms.md create mode 100644 10_Wiki/Topics/AI/Dense vs Sparse Neural Networks.md create mode 100644 10_Wiki/Topics/AI/Differentiable Programming.md create mode 100644 10_Wiki/Topics/AI/Dijkstra's Algorithm.md create mode 100644 10_Wiki/Topics/AI/Distributed Reinforcement Learning.md create mode 100644 10_Wiki/Topics/AI/Dynamic-Environment-Handling.md diff --git a/10_Wiki/Topics/AI/DQN.md b/10_Wiki/Topics/AI/DQN.md new file mode 100644 index 00000000..b8376c88 --- /dev/null +++ b/10_Wiki/Topics/AI/DQN.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DQN +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.99 +tags: [DQN, Deep Q-Networks, Reinforcement Learning, AI] +last_reinforced: 2026-04-20 +--- + +# [[DQN]] (์‹ฌ์ธต Q-๋„คํŠธ์›Œํฌ) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๋”ฅ๋Ÿฌ๋‹์ด ๊ฐ•ํ™”ํ•™์Šต์˜ ๋ˆˆ์ด ๋˜์—ˆ๋‹ค." ํ…Œ์ด๋ธ” ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, ๋ณต์žกํ•œ ํ™”๋ฉด ์ด๋ฏธ์ง€(ํ”ฝ์…€)๋ฅผ ์ง์ ‘ ๋ณด๊ณ  ์ตœ์ ์˜ ํ–‰๋™์„ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋งŒ๋“  AI ์—ญ์‚ฌ์˜ ํ•œ ํš์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Experience Replay**: + - ๊ณผ๊ฑฐ์˜ ๊ฒฝํ—˜์„ ๋ฉ”๋ชจ๋ฆฌ ๋ฒ„ํผ์— ์ €์žฅํ–ˆ๋‹ค๊ฐ€ ๋ฌด์ž‘์œ„๋กœ ๊บผ๋‚ด ํ•™์Šตํ•จ์œผ๋กœ์จ, ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋Š๊ณ  ํ•™์Šต ์•ˆ์ „์„ฑ์„ ๋†’์ธ๋‹ค. +- **Fixed Q-Targets**: + - ํ•™์Šต ๋Œ€์ƒ(Target)์ด ๊ณ„์† ๋ณ€ํ•ด์„œ ์ƒ๊ธฐ๋Š” ๋ถˆ์•ˆ์ •์„ฑ์„ ๋ง‰๊ธฐ ์œ„ํ•ด, ๋ณ„๋„์˜ ํƒ€๊ฒŸ ๋„คํŠธ์›Œํฌ๋ฅผ ๋‘๊ณ  ์ผ์ •ํ•œ ์ฃผ๊ธฐ๋งˆ๋‹ค ์—…๋ฐ์ดํŠธํ•œ๋‹ค. +- **Application**: + - ์•„ํƒ€๋ฆฌ(Atari) ๊ฒŒ์ž„ ์ •๋ณต๋ถ€ํ„ฐ ๋กœ๋ด‡ ์ œ์–ด, ์ฃผ์‹ ํŠธ๋ ˆ์ด๋”ฉ ๋“ฑ ๋ถˆํ™•์‹คํ•œ ํ™˜๊ฒฝ์˜ ์˜์‚ฌ๊ฒฐ์ •์— ๋„๋ฆฌ ์“ฐ์ธ๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- DQN์€ ํ–‰๋™ ๊ณต๊ฐ„(Action Space)์ด ์ด์‚ฐ์ (Discrete)์ผ ๋•Œ๋งŒ ์œ ๋ฆฌํ•˜๋‹ค. ์—ฐ์†์ ์ธ ์›€์ง์ž„์ด ํ•„์š”ํ•œ ์ž์œจ์ฃผํ–‰์ด๋‚˜ ๋กœ๋ด‡ ํŒ” ์ œ์–ด์—๋Š” `DDPG`๋‚˜ `PPO` ๊ฐ™์€ ํ›„์† ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋” ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Reinforcement Learning]] , [[Bellman-Equation]] +- Foundation: [[Information Theory]] diff --git a/10_Wiki/Topics/AI/Data Cleaning Algorithms.md b/10_Wiki/Topics/AI/Data Cleaning Algorithms.md new file mode 100644 index 00000000..f354b2c6 --- /dev/null +++ b/10_Wiki/Topics/AI/Data Cleaning Algorithms.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DATA-CLEAN +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.97 +tags: [Data Cleaning, Machine Learning, Data Quality, Preprocessing] +last_reinforced: 2026-04-20 +--- + +# [[Data-Cleaning-Algorithms]] (๋ฐ์ดํ„ฐ ์ •์ œ ์•Œ๊ณ ๋ฆฌ์ฆ˜) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "ํ’ˆ์งˆ์ด ์„ฑ๋Šฅ์„ ์ด๊ธด๋‹ค." ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๋ฐ”๊พธ๋Š” ๊ฒƒ๋ณด๋‹ค ๋ฐ์ดํ„ฐ ์†์˜ ๋…ธ์ด์ฆˆ์™€ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์ด AI ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ํ›จ์”ฌ ๋” ๋“œ๋ผ๋งˆํ‹ฑํ•œ ๊ฒฐ๊ณผ(Data-centric AI)๋ฅผ ์ค€๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Outlier Detection**: + - ํ†ต๊ณ„์  ๊ธฐ๋ฒ•(Z-score, IQR) ๋˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹(Isolation Forest)์„ ํ†ตํ•ด ์ •์ƒ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚œ ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•œ๋‹ค. +- **Handling Missing Values**: + - ๋น„์–ด์žˆ๋Š” ๊ฐ’์„ ํ‰๊ท ๊ฐ’์œผ๋กœ ์ฑ„์šธ์ง€, ์•„๋‹ˆ๋ฉด ์˜ˆ์ธก ๋ชจ๋ธ์„ ํ†ตํ•ด ์ถ”๋ก ํ•ด์„œ ์ฑ„์šธ์ง€(Imputation) ์ „๋žต์„ ์ˆ˜๋ฆฝํ•œ๋‹ค. +- **Normalization & Scaling**: + - ๋ฐ์ดํ„ฐ์˜ ์ˆ˜์น˜๊ฐ€ ๋„ˆ๋ฌด ์ œ๊ฐ๊ฐ์ด๋ฉด ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๋ฏ€๋กœ, ์ผ์ •ํ•œ ๋ฒ”์œ„(0~1 ๋“ฑ)๋กœ ๋งž์ถ”๋Š” ์Šค์ผ€์ผ๋ง ๊ณผ์ •์ด ํ•„์ˆ˜์ ์ด๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ๋ฌด์กฐ๊ฑด์ ์ธ ๋ฐ์ดํ„ฐ ์‚ญ์ œ๋Š” '์ค‘์š”ํ•œ ์˜ˆ์™ธ ์ƒํ™ฉ(Edge case)' ์ •๋ณด๊นŒ์ง€ ๋‚ ๋ ค๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ์‚ฌ๊ณ  ์˜ˆ๋ฐฉ์ด ์ค‘์š”ํ•œ ๋ณด์•ˆ์ด๋‚˜ ์ž์œจ์ฃผํ–‰ ๋ถ„์•ผ์—์„œ๋Š” ์ด์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒ„๋ฆฌ๊ธฐ๋ณด๋‹ค, ๊ทธ๊ฒƒ์ด ์™œ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ๋ถ„์„ํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ณ‘ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Information Theory]] , [[Reliability_Safety_First]] +- Foundation: [[Computational Thinking]] diff --git a/10_Wiki/Topics/AI/Dense vs Sparse Neural Networks.md b/10_Wiki/Topics/AI/Dense vs Sparse Neural Networks.md new file mode 100644 index 00000000..d86eac42 --- /dev/null +++ b/10_Wiki/Topics/AI/Dense vs Sparse Neural Networks.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DENSE-SPARSE +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.98 +tags: [Neural Networks, Dense, Sparse, MoE, Efficiency] +last_reinforced: 2026-04-20 +--- + +# [[Dense-vs-Sparse-Neural-Networks]] (๋ฐ€์ง‘ vs ํฌ์†Œ ์‹ ๊ฒฝ๋ง) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "๋ชจ๋‘๋ฅผ ๊นจ์šธ ๊ฒƒ์ธ๊ฐ€, ํ•„์š”ํ•œ ๋†ˆ๋งŒ ๊นจ์šธ ๊ฒƒ์ธ๊ฐ€." ๋‡Œ๊ฐ€ ๋ชจ๋“  ๋‰ด๋Ÿฐ์„ ๋™์‹œ์— ์“ฐ์ง€ ์•Š๋“ฏ์ด, AI๋„ ํ•„์š”ํ•œ ๋ถ€์œ„๋งŒ ํ™œ์„ฑํ™”ํ•˜์—ฌ ๊ฑฐ๋Œ€ํ•œ ์ง€๋Šฅ์„ ๊ฐ€๋ณ๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Dense Neural Networks**: + - ๋ชจ๋“  ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์ด ์ด˜์ด˜ํ•˜๊ฒŒ ์—ฐ๊ฒฐ๋œ ๊ตฌ์กฐ. ๊ณ„์‚ฐ๋Ÿ‰์€ ๋งŽ์ง€๋งŒ ๊ตฌํ˜„์ด ์‰ฝ๊ณ  ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ์— ์ ํ•ฉํ•˜๋‹ค. +- **Sparse Neural Networks (Pruning)**: + - ์ค‘์š”ํ•˜์ง€ ์•Š์€ ๊ฐ€์ค‘์น˜(์˜ํ–ฅ๋ ฅ์ด ์ ์€ ์—ฐ๊ฒฐ)๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ๊ธฐ๋ฒ•. +- **Mixture of Experts (MoE)**: + - ์ตœ๊ทผ GPT-4 ๋“ฑ ๊ฑฐ๋Œ€ ๋ชจ๋ธ์˜ ํ•ต์‹ฌ ๊ธฐ์ˆ . ๋ชจ๋ธ ์•ˆ์— ์ˆ˜์‹ญ ๋ช…์˜ '์ „๋ฌธ๊ฐ€'๋ฅผ ๋‘๊ณ , ์งˆ๋ฌธ์˜ ์„ฑ๊ฒฉ์— ๋งž๋Š” ์ „๋ฌธ๊ฐ€๋งŒ ๊ณจ๋ผ ํ™œ์„ฑํ™”ํ•˜์—ฌ ์„ฑ๋Šฅ์€ ๋†’์ด๊ณ  ์—ฐ์‚ฐ ๋น„์šฉ์€ ๋‚ฎ์ถ˜๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ํฌ์†Œ ํ–‰๋ ฌ ์—ฐ์‚ฐ์€ ํ•˜๋“œ์›จ์–ด(GPU) ๊ฐ€์†๊ธฐ์—์„œ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ๊ฐ€ ๊นŒ๋‹ค๋กœ์šด ๋ฉด์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์†Œํ”„ํŠธ์›จ์–ด์ ์ธ 'ํฌ์†Œํ™”'์™€ ํ•˜๋“œ์›จ์–ด์˜ '๊ฐ€์† ํšจ์œจ' ์‚ฌ์ด์˜ ๊ท ํ˜•์ ์„ ์ฐพ๋Š” ๊ฒƒ์ด ํ˜„๋Œ€ AI ๊ณตํ•™์˜ ์ตœ๋Œ€ ํ™”๋‘๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Differentiable-Programming]] , [[Deep-Reinforcement-Learning]] +- Foundation: [[Information Theory]] diff --git a/10_Wiki/Topics/AI/Differentiable Programming.md b/10_Wiki/Topics/AI/Differentiable Programming.md new file mode 100644 index 00000000..1aa32506 --- /dev/null +++ b/10_Wiki/Topics/AI/Differentiable Programming.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DIF-PROG +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.98 +tags: [Differentiable Programming, AI, JAX, PyTorch, Optimization] +last_reinforced: 2026-04-20 +--- + +# [[Differentiable-Programming]] (๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "ํ”„๋กœ๊ทธ๋žจ ์ž์ฒด๊ฐ€ ํ•™์Šต์˜ ๋Œ€์ƒ์ด๋‹ค." ์กฐ๊ฑด๋ฌธ, ๋ฃจํ”„, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ•จ์ˆ˜๊นŒ์ง€ ๋ชจ๋‘ ์กฐ์ ˆ ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ณด๊ณ , ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์„ ์ตœ์ ํ™”ํ•˜๋Š” ํ˜๋ช…์  ํŒจ๋Ÿฌ๋‹ค์ž„์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **End-to-End Optimization**: + - ์ž…๋ ฅ๋ถ€ํ„ฐ ์ถœ๋ ฅ๊นŒ์ง€ ๋ชจ๋“  ์—ฐ์‚ฐ ๊ณผ์ •์ด ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์–ด, ์˜ค์ฐจ(Loss)๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ํ”„๋กœ๊ทธ๋žจ ์ „์ฒด๋กœ ๋ฏธ๋ถ„ ์‹ ํ˜ธ(Gradient)๋ฅผ ์ „ํŒŒํ•ด ์Šค์Šค๋กœ ์ˆ˜์ •ํ•˜๊ฒŒ ํ•œ๋‹ค. +- **Software 2.0**: + - ์•ˆ๋“œ๋ ˆ์ด ์นดํŒŒ์‹œ๊ฐ€ ์ œ์•ˆํ•œ ๊ฐœ๋…. ์‚ฌ๋žŒ์ด ์ผ์ผ์ด ๋กœ์ง์„ ์งœ๋Š” Software 1.0์—์„œ, ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋กœ์ง(์‹ ๊ฒฝ๋ง ๊ฐ€์ค‘์น˜)์ด ์ƒ์„ฑ๋˜๋Š” Software 2.0์œผ๋กœ์˜ ์ „ํ™˜. +- **Frameworks**: + - `JAX`, `PyTorch` ๋“ฑ ์ž๋™ ๋ฏธ๋ถ„(Auto-grad) ๊ธฐ๋Šฅ์„ ๊ฐ€์ง„ ํ”„๋ ˆ์ž„์›Œํฌ๋“ค์ด ์ด ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ์ค‘์ถ” ์—ญํ• ์„ ํ•œ๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ๋ชจ๋“  ๋กœ์ง์„ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ๋Š” ์–ด๋ ต๋‹ค(ํŠนํžˆ ๋ถˆ์—ฐ์†์ ์ธ ์ด์‚ฐ์  ์„ ํƒ). ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด `Reinforce` ๊ธฐ๋ฒ•์ด๋‚˜ `Gumbel-Softmax` ๊ฐ™์€ ํŠธ๋ฆญ์„ ์จ์„œ ํ™•๋ฅ ์ ์œผ๋กœ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ์—ฐ๊ฒฐํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํ•˜๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Deep-Reinforcement-Learning]] , [[Complexity-Theory]] +- Foundation: [[Computational Theory & Math/Information Theory]] diff --git a/10_Wiki/Topics/AI/Dijkstra's Algorithm.md b/10_Wiki/Topics/AI/Dijkstra's Algorithm.md new file mode 100644 index 00000000..2e3f4e4c --- /dev/null +++ b/10_Wiki/Topics/AI/Dijkstra's Algorithm.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DIJKSTRA +category: "[[10_Wiki/๐Ÿ’ก Topics/Programming & Language]]" +confidence_score: 0.99 +tags: [Dijkstra, Algorithm, Pathfinding, Graph Theory] +last_reinforced: 2026-04-20 +--- + +# [[Dijkstra's-Algorithm]] (๋ฐ์ดํฌ์ŠคํŠธ๋ผ ์•Œ๊ณ ๋ฆฌ์ฆ˜) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์š•์‹ฌ์Ÿ์ด(Greedy)์˜ ๊ฐ€์žฅ ๋˜‘๋˜‘ํ•œ ๊ธธ ์ฐพ๊ธฐ." ์ถœ๋ฐœ์ ์—์„œ ๋‹ค๋ฅธ ๋ชจ๋“  ์ง€์ ๊นŒ์ง€์˜ ์ตœ๋‹จ ๊ฑฐ๋ฆฌ๋ฅผ ๊ฐ€์žฅ ํšจ์œจ์ ์œผ๋กœ ํ™•์ •ํ•ด ๋‚˜๊ฐ€๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ณ ์ „์ด์ž ์ •์„์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Shortest Path Tree**: + - ๋ฐฉ๋ฌธํ•˜์ง€ ์•Š์€ ์ง€์  ์ค‘ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ์งง์€ ์ง€์ ์„ ๋จผ์ € ๋ฐฉ๋ฌธํ•จ์œผ๋กœ์จ, ํ•œ ๋ฒˆ ํ™•์ •๋œ ๊ฑฐ๋ฆฌ๋Š” ๋‹ค์‹œ ๊ณ„์‚ฐํ•  ํ•„์š”๊ฐ€ ์—†๊ฒŒ ๋งŒ๋“ ๋‹ค. +- **Priority Queue Usage**: + - ์šฐ์„ ์ˆœ์œ„ ํ(ํž™)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ์— ๋ฐฉ๋ฌธํ•  ์ง€์ ์„ ๋น ๋ฅด๊ฒŒ ์ฐพ์•„๋ƒ„์œผ๋กœ์จ ์‹œ๊ฐ„ ๋ณต์žก๋„๋ฅผ ๊ทน์ ์œผ๋กœ ์ตœ์ ํ™”ํ•œ๋‹ค. +- **Application**: + - ๊ตฌ๊ธ€ ์ง€๋„, ๊ฒŒ์ž„ ๊ธธ์ฐพ๊ธฐ, ๋„คํŠธ์›Œํฌ ๋ผ์šฐํŒ…(OSPF) ๋“ฑ ์—ฐ๊ฒฐ๋œ ๋„คํŠธ์›Œํฌ๊ฐ€ ์žˆ๋Š” ๋ชจ๋“  ๊ณณ์— ์“ฐ์ธ๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ๋ฐ์ดํฌ์ŠคํŠธ๋ผ๋Š” ์Œ์ˆ˜ ๊ฐ€์ค‘์น˜(์Œ์ˆ˜์˜ ๊ฑฐ๋ฆฌ ๋“ฑ)๊ฐ€ ์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ๋Š” ์ž‘๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค(์ด๋• ๋ฒจ๋งŒ-ํฌ๋“œ ํ•„์š”). ๋˜ํ•œ, ๊ฑฐ๋Œ€ํ•œ ๋งต์—์„œ๋Š” ํƒ์ƒ‰ ๋ฒ”์œ„๊ฐ€ ๋„ˆ๋ฌด ๋„“์–ด์ง€๋ฏ€๋กœ, ๋ชฉํ‘œ ์ง€์  ๋ฐฉํ–ฅ์œผ๋กœ ๋จผ์ € ํƒ์ƒ‰ํ•˜๋Š” ์ง€๋Šฅ์„ ๋”ํ•œ `A* (A-Star) ์•Œ๊ณ ๋ฆฌ์ฆ˜`์ด ์‹ค๋ฌด์—์„œ ๋” ์„ ํ˜ธ๋œ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Autonomous-Vehicle-Path-Planning]] , [[Combinatorial-Optimization]] +- Foundation: [[Computational Theory & Math/Information Theory]] diff --git a/10_Wiki/Topics/AI/Distributed Reinforcement Learning.md b/10_Wiki/Topics/AI/Distributed Reinforcement Learning.md new file mode 100644 index 00000000..637cf120 --- /dev/null +++ b/10_Wiki/Topics/AI/Distributed Reinforcement Learning.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DIST-RL +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.98 +tags: [Distributed RL, Scalability, AI, Apex, Impala] +last_reinforced: 2026-04-20 +--- + +# [[Distributed-Reinforcement-Learning]] (๋ถ„์‚ฐ ๊ฐ•ํ™”ํ•™์Šต) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "ํ˜ผ์ž ๋ฐฐ์šฐ๋ฉด 1๋…„, ํ•จ๊ป˜ ๋ฐฐ์šฐ๋ฉด 1์‹œ๊ฐ„." ์ˆ˜๋งŽ์€ ์—์ด์ „ํŠธ๋ฅผ ๊ฐ€์ƒ ํ™˜๊ฒฝ์— ํ’€์–ด ๋™์‹œ์— ๊ฒฝํ—˜์„ ์Œ“๊ฒŒ ํ•˜๊ณ , ์ด๋ฅผ ํ•˜๋‚˜์˜ ๋‡Œ๋กœ ์ง‘์•ฝํ•˜๋Š” ์ดˆ๊ณ ์† ํ•™์Šต ๊ธฐ์ˆ ์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Parallel Data Collection**: + - ์ˆ˜๋ฐฑ~์ˆ˜์ฒœ ๊ฐœ์˜ CPU/GPU ํ™˜๊ฒฝ์—์„œ ๋…๋ฆฝ์ ์ธ ์—์ด์ „ํŠธ๋“ค์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜์—ฌ ์ค‘์•™ ์„œ๋ฒ„๋กœ ์ „์†กํ•œ๋‹ค. +- **Asynchronous vs Synchronous**: + - ์—์ด์ „ํŠธ๋“ค๋ผ๋ฆฌ ์†๋„๋ฅผ ๋งž์ถœ์ง€(Sync), ์•„๋‹ˆ๋ฉด ๊ฐ์ž ๋ฐ์ดํ„ฐ๊ฐ€ ์ƒ๊ธฐ๋Š” ๋Œ€๋กœ ์—…๋ฐ์ดํŠธํ• ์ง€(Async)์— ๋”ฐ๋ฅธ ์•„ํ‚คํ…์ฒ˜ ์ฐจ์ด(A3C, IMPALA ๋“ฑ). +- **Efficiency Boost**: + - ํƒ์ƒ‰(Exploration)์˜ ์†์‹ค์„ ๋ฐฉ์ง€ํ•˜๊ณ , ๋” ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์งง์€ ์‹œ๊ฐ„ ์•ˆ์— ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ๋ถ„์‚ฐ ํ•™์Šต์€ ์—„์ฒญ๋‚œ ์ปดํ“จํŒ… ์ž์›์„ ์†Œ๋ชจํ•œ๋‹ค. ์ตœ๊ทผ์—๋Š” ์ž์› ํšจ์œจ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด '์˜คํ”„ ํด๋ฆฌ์‹œ(Off-policy)' ๋ฐ์ดํ„ฐ๋ฅผ ๋” ํšจ๊ณผ์ ์œผ๋กœ ์žฌํ™œ์šฉํ•˜๋Š” `R2D2`๋‚˜ `MuZero` ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[DQN]] , [[Collective-Intelligence]] +- Foundation: [[Distributed-Systems-Engineering]] diff --git a/10_Wiki/Topics/AI/Dynamic-Environment-Handling.md b/10_Wiki/Topics/AI/Dynamic-Environment-Handling.md new file mode 100644 index 00000000..3ae20227 --- /dev/null +++ b/10_Wiki/Topics/AI/Dynamic-Environment-Handling.md @@ -0,0 +1,27 @@ +--- +id: P-REINFORCE-AI-DYNAMIC-ENV +category: "[[10_Wiki/๐Ÿ’ก Topics/AI]]" +confidence_score: 0.96 +tags: [Dynamic Environment, Autonomous Driving, Adaptation, AI] +last_reinforced: 2026-04-20 +--- + +# [[Dynamic-Environment-Handling]] (๋™์  ํ™˜๊ฒฝ ๋Œ€์‘) + +## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) +> "์„ธ์ƒ์€ ๋ฉˆ์ถฐ ์žˆ์ง€ ์•Š๋‹ค." ๋น„, ๋ˆˆ, ์•ˆ๊ฐœ, ๊ฐ‘์ž๊ธฐ ๋›ฐ์–ด๋“œ๋Š” ์•„์ด์ฒ˜๋Ÿผ ๋Š์ž„์—†์ด ๋ณ€ํ•˜๋Š” ํ˜„์‹ค ์„ธ๊ณ„์˜ ๋ณ€๋•์— ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ ์‘ํ•˜๋Š” AI์˜ ํšŒ๋ณต ํƒ„๋ ฅ์„ฑ์ด๋‹ค. + +## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) +- **Robust Perception**: + - ์„ผ์„œ ๋…ธ์ด์ฆˆ๋‚˜ ๊ธฐ์ƒ ์•…ํ™” ์ƒํ™ฉ์—์„œ๋„ ์‚ฌ๋ฌผ์„ ์ •ํ™•ํžˆ ์ธ์‹ํ•˜๋Š” ๊ฐ•๊ฑดํ•œ ์‹œ๊ฐ ์‹œ์Šคํ…œ. +- **Real-time Path Planning**: + - ์žฅ์• ๋ฌผ์ด ๋‚˜ํƒ€๋‚  ๋•Œ๋งˆ๋‹ค ์ˆ˜ ๋ฐ€๋ฆฌ์ดˆ(ms) ์ด๋‚ด์— ์ƒˆ๋กœ์šด ์•ˆ์ „ ๊ฒฝ๋กœ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ธฐ์ˆ . +- **Domain Adaptation**: + - ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ(Sim)๊ณผ ์‹ค์ œ ๋„๋กœ ํ™˜๊ฒฝ(Real)์˜ ์ฐจ์ด๋ฅผ ๋ฉ”๊พธ์–ด, ๊ฐ€์ƒ์—์„œ ๋ฐฐ์šด ์ง€์‹์„ ํ˜„์‹ค์—์„œ๋„ ์œ ํšจํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ์ „์ด ํ•™์Šต ๊ธฐ๋ฒ•. + +## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) +- ๋ชจ๋“  ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๋ฏธ๋ฆฌ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์ตœ๊ทผ์—๋Š” '์„ธ๊ณ„ ๋ชจ๋ธ(World Model)'์„ ํ†ตํ•ด AI๊ฐ€ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ์ดํ•ดํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ, ์ฒ˜์Œ ๋ณด๋Š” ๋Œ๋ฐœ ์ƒํ™ฉ์—์„œ๋„ ์ƒ์‹์ ์ธ ์ˆ˜์ค€์˜ ๋Œ€์‘์„ ํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋Œ€์„ธ๋‹ค. + +## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) +- Related: [[Autonomous-Vehicle-Path-Planning]] , [[Reliability_Safety_First]] +- Foundation: [[Computational Thinking]]