--- id: P-REINFORCE-AI-RLAIF category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 0.95 tags: [Alignment, RLAIF, AISafety, Scalability] last_reinforced: 2026-04-20 --- # [[RLAIF (AI ํ”ผ๋“œ๋ฐฑ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต)]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ธ๊ฐ„์˜ ์ž๋ฆฌ๋ฅผ ๋” ๋˜‘๋˜‘ํ•œ AI๊ฐ€ ๋Œ€์‹ ํ•˜๋Š” ์ •๋ ฌ ๊ฐ€์†๊ธฐ." ์ธ๊ฐ„์˜ ํ”ผ๋“œ๋ฐฑ(RLHF) ๋Œ€์‹  ๊ณ ์„ฑ๋Šฅ AI ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ ํ”ผ๋“œ๋ฐฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค๋ฅธ ๋ชจ๋ธ์„ ์ •๋ ฌํ•˜๊ณ  ํ•™์Šต์‹œํ‚ค๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **The Concept**: - RLHF๋Š” ์ธ๊ฐ„ ๋…ธ๋™๋ ฅ์— ์˜์กดํ•˜๋ฏ€๋กœ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ณ  ์†๋„๊ฐ€ ๋А๋ฆฌ๋‹ค. - RLAIF๋Š” '์„ ์ƒ๋‹˜ AI'๊ฐ€ ํ—Œ๋ฒ•(๊ทœ์น™)์— ๋”ฐ๋ผ ํ•™์ƒ ๋ชจ๋ธ์˜ ๋‹ต๋ณ€์„ ํ‰๊ฐ€ํ•˜๊ณ  ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๊ฒŒ ํ•œ๋‹ค. - **Workflow**: - ๋ชจ๋ธA๊ฐ€ ๋‘ ๊ฐœ์˜ ๋‹ต๋ณ€ ์ƒ์„ฑ -> ๋ชจ๋ธB(ํ‰๊ฐ€์ž)๊ฐ€ ๊ทœ์น™์— ๊ทผ๊ฑฐํ•ด ์ˆœ์œ„ ๊ฒฐ์ • -> ์ด ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ชจ๋ธA ๊ฐ•ํ™”ํ•™์Šต. - **Significance**: ์ •๋ ฌ์˜ ์Šค์ผ€์ผ๋ง(Scaling Alignment)์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ, ์ธ๊ฐ„์ด ์ผ์ผ์ด ๊ฒ€์ˆ˜ํ•  ์ˆ˜ ์—†๋Š” ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ๋„ ๊ณ ํ’ˆ์งˆ ์ •๋ ฌ์„ ์œ ์ง€ํ•œ๋‹ค. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) - 'AI๊ฐ€ AI๋ฅผ ๊ฐ€๋ฅด์นœ๋‹ค'๋Š” ์ ์—์„œ ํŽธํ•ญ์˜ ์ฆํญ์ด๋‚˜ ๋ชจ๋ธ ๋ถ•๊ดด(Model Collapse)์˜ ์šฐ๋ ค๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•ด ์ธ๊ฐ„ ๊ฐ์‹œ์ž(Human Overseer)๊ฐ€ ๋ฃจํ”„์— ํฌํ•จ๋˜์–ด AI์˜ ํ‰๊ฐ€ ๊ธฐ์ค€์ด ์˜ฌ๋ฐ”๋ฅธ์ง€ ์ •๊ธฐ์ ์œผ๋กœ ๊ฒ€์‚ฌํ•˜๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ฐฉ์‹์ด ๊ถŒ์žฅ๋œ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Related: [[Constitutional AI (ํ—Œ๋ฒ• AI)]] , RLHF (์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต) - Risk: Model Collapse (๋ชจ๋ธ ๋ถ•๊ดด ํ˜„์ƒ)