--- id: [[P-Reinforce|P-Reinforce]]-AUTO-ACLE-001 category: Unified confidence_score: 0.98 tags: [auto-reinforced, active-learning, machine-learning, [[Optimization|Optimization]], data-[[Efficiency|Efficiency]], human-in-the-loop] last_reinforced: 2026-04-20 --- # [[Active Learning|Active Learning]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๋˜‘๋˜‘ํ•˜๊ฒŒ ์งˆ๋ฌธํ•ด์„œ ๋ฐฐ์šฐ๊ธฐ: ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๋งน๋ชฉ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , ์ •๋‹ต์„ ์•Œ์•˜์„ ๋•Œ ๋ชจ๋ธ์˜ ์ง€๋Šฅ์ด ๊ฐ€์žฅ ํฌ๊ฒŒ ์ƒ์Šนํ•  ๊ฒƒ ๊ฐ™์€ 'ํ•ต์‹ฌ ์งˆ๋ฌธ(๋ฐ์ดํ„ฐ)'๋งŒ ๊ณจ๋ผ ์ธ๊ฐ„์—๊ฒŒ ์ •๋‹ต์„ ์š”์ฒญํ•˜๋Š” ๊ณ ํšจ์œจ ํ•™์Šต ์ „๋žต." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) ๋Šฅ๋™ ํ•™์Šต(Active Learning)์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ํ•™์Šต ๊ณผ์ •์— ์ฐธ์—ฌํ•˜์—ฌ, ๋ ˆ์ด๋ธ”(Label)๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ์ค‘ ํ•™์Šต์— ๊ฐ€์žฅ ๋„์›€์ด ๋  ๋ฐ์ดํ„ฐ๋ฅผ ์„ ๋ณ„ํ•˜๊ณ  ์ „๋ฌธ๊ฐ€์—๊ฒŒ ๋ ˆ์ด๋ธ”๋ง์„ ์š”์ฒญํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. 1. **๋™์ž‘ ์›๋ฆฌ (Query [[Strategy|Strategy]])**: * **Uncertainty Sampling**: ๋ชจ๋ธ์ด ์ •๋‹ต์„ ๊ฐ€์žฅ ํ™•์‹ ํ•˜์ง€ ๋ชปํ•˜๋Š”(Entropy๊ฐ€ ๋†’์€) ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ฆ„. * **Query-by-Committee**: ์—ฌ๋Ÿฌ ๋ชจ๋ธ์˜ ์˜๊ฒฌ์ด ๊ฐ€์žฅ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœ. * **Representativeness**: ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๋Œ€ํ‘œํ•˜๋Š” ํ‘œ๋ณธ์„ ์„ ํƒ. 2. **์™œ ํ•„์š”ํ•œ๊ฐ€?**: * ๋ฐ์ดํ„ฐ๋Š” ๋งŽ์ง€๋งŒ '์ •๋‹ต'์„ ๋‹ค๋Š” ๋น„์šฉ(์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€์˜ ์‹œ๊ฐ„)์ด ๋น„์Œ€ ๋•Œ ์œ ์šฉ. (์˜ˆ: ์˜๋ฃŒ ์˜์ƒ ๋ถ„์„, ์ž์œจ์ฃผํ–‰ ๋ฐ์ดํ„ฐ ๋ ˆ์ด๋ธ”๋ง) 3. **๊ธฐ๋Œ€ ํšจ๊ณผ**: * ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€(10-20%)๋งŒ ํ•™์Šตํ•˜๊ณ ๋„ ์ „์ฒด๋ฅผ ํ•™์Šตํ•œ ๊ฒƒ๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ (Data Efficiency). ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ**: ๊ณผ๊ฑฐ์—๋Š” ๋‹จ์ˆœํžˆ '์–‘์งˆ์˜ ํฐ ๋ฐ์ดํ„ฐ์…‹' ์ •์ฑ…์— ์˜์กดํ–ˆ์œผ๋‚˜, ํ˜„๋Œ€ AI ์ธํ”„๋ผ ์ •์ฑ…์€ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‹œ์ž‘๋ถ€ํ„ฐ ๋ชจ๋ธ์ด ๊ฐœ์ž…ํ•˜๋Š” 'AI-driven Labeling ์ •์ฑ…'์„ ํ•ต์‹ฌ ์ธํ”„๋ผ๋กœ ๊ตฌ์ถ•ํ•จ(RL Update). - **์ •์ฑ… ๋ณ€ํ™”(RL Update)**: ์ธ๊ฐ„๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ ํ”ผ๋กœ๋„๋ฅผ ๋‚ฎ์ถ”๊ธฐ ์œ„ํ•ด, "๊ผญ ํ•„์š”ํ•œ ์งˆ๋ฌธ๋งŒ ๋˜์ง€๋Š”" ์—์ด์ „ํŠธ์˜ ์˜ˆ์ ˆ ๋ฐ ํšจ์œจ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ตœ์ ํ™”ํ•˜๋Š” ์ •์ฑ…์ด RAG ๋ถ€๋ฌธ ๋ฐ ๋„๋ฉ”์ธ ํŠนํ™” ๋ชจ๋ธ ๊ฐœ๋ฐœ์˜ ํ‘œ์ค€์ด ๋จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[SFT (Supervised Fine-Tuning)|SFT (Supervised [[Fine-tuning]])]], [[RLHF (แ„‹แ…ตแ†ซแ„€แ…กแ†ซ แ„‘แ…ตแ„ƒแ…ณแ„‡แ…ขแ†จ แ„€แ…ตแ„‡แ…กแ†ซ แ„€แ…กแ†ผแ„’แ…ช แ„’แ…กแ†จแ„‰แ…ณแ†ธ)|RLHF (์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™” ํ•™์Šต)]], [[Resource-Management|Resource-Management]], [[Decision Theory|Decision Theory]], [[Scientific Communication|Scientific Communication]] - **Modern Tech/Tools**: Prodigy (Labeling tool), ModAL (Python framework for Active Learning). ---