--- id: BON-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai-inference, llm, sampling-strategy, post-processing] last_reinforced: 2026-04-26 --- # [[Best-of-N Sampling (แ„Žแ…ฌแ„Œแ…ฅแ†จ แ„‰แ…ขแ†ทแ„‘แ…ณแ†ฏแ„…แ…ตแ†ผ)|Best-of-N Sampling (์ตœ์  ์ƒ˜ํ”Œ๋ง)]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๋งŽ์ด ๋ฝ‘๊ณ  ๊ฐ€์žฅ ์ข‹์€ ๊ฒƒ์„ ๊ณจ๋ผ๋ผ" โ€” ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ N๊ฐœ์˜ ์‘๋‹ต์„ ์ƒ์„ฑํ•œ ๋’ค, ๋ณ„๋„์˜ ๋ณด์ƒ ๋ชจ๋ธ(RM)์ด๋‚˜ ์ฑ„์  ๊ธฐ์ค€์„ ํ†ตํ•ด ๊ฐ€์žฅ ํ’ˆ์งˆ์ด ๋†’์€ ์ตœ์ ์˜ ๋‹ต๋ณ€ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๋Š” ์ถ”๋ก  ์ตœ์ ํ™” ๊ธฐ๋ฒ•. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ์ƒ์„ฑ(Generation)๊ณผ ๊ฒ€์ฆ(Verification) ๋‹จ๊ณ„๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ, ๋‹จ์ผ ์ƒ์„ฑ ์‹œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฐ(Hallucination)์ด๋‚˜ ์ €ํ’ˆ์งˆ ์‘๋‹ต ๋ฆฌ์Šคํฌ๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ์–ต์ œํ•˜๋Š” ํŒจํ„ด. - **์„ธ๋ถ€ ๋‚ด์šฉ:** - **N๊ฐœ ์ƒ์„ฑ:** ๋™์ผํ•œ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•ด ์˜จ๋„๋ฅผ ์กฐ์ ˆํ•˜๋ฉฐ ๋…๋ฆฝ์ ์ธ N๊ฐœ์˜ ์‘๋‹ต ํ›„๋ณด๊ตฐ์„ ํ™•๋ณด. - **Reward Model (RM):** ๊ฐ ํ›„๋ณด ์‘๋‹ต์˜ ๋…ผ๋ฆฌ์„ฑ, ์•ˆ์ „์„ฑ, ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜์—ฌ ์ ์ˆ˜๋ฅผ ๋ถ€์—ฌ. - **Rejection Sampling:** ์ ์ˆ˜๊ฐ€ ๋‚ฎ์€ ์‘๋‹ต์€ ๋ฒ„๋ฆฌ๊ณ  ์ตœ๊ณ ์ ์„ ๋ฐ›์€ ์‘๋‹ต๋งŒ์„ ์ตœ์ข… ์ถœ๋ ฅ์œผ๋กœ ์„ ํƒ. - **์—ฐ์‚ฐ ๋น„์šฉ:** ์ถ”๋ก  ์‹œ N๋ฐฐ์˜ ์ปดํ“จํŒ… ์ž์›์ด ์†Œ๋ชจ๋˜์ง€๋งŒ, ๊ฒฐ๊ณผ๋ฌผ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋น„์•ฝ์ ์œผ๋กœ ์ƒ์Šน์‹œํ‚ด. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœํžˆ ํ™•๋ฅ  ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ํ† ํฐ์„ ๊ณ ๋ฅด๋˜ ๋ฐฉ์‹์—์„œ, ์ „์ฒด ๋ฌธ๋งฅ์˜ ์™„์„ฑ๋„๋ฅผ ์‚ฌํ›„์— ํ‰๊ฐ€ํ•˜๋Š” '๊ฒ€์ฆ ๊ธฐ๋ฐ˜ ์ถ”๋ก '์œผ๋กœ์˜ ๋ฐœ์ „. - **์ •์ฑ… ๋ณ€ํ™”:** ์‹ค์‹œ๊ฐ„ ์‘๋‹ต์ด ์ค‘์š”ํ•œ ์ฑ—๋ด‡๋ณด๋‹ค๋Š” ์ •ํ™•๋„๊ฐ€ ์ƒ๋ช…์ธ ์ฝ”๋“œ ์ƒ์„ฑ์ด๋‚˜ ๋ฐ์ดํ„ฐ ์ถ”์ถœ ์—์ด์ „ํŠธ์—์„œ ์ฃผ๋กœ ์ฑ„ํƒ๋จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - **Parent:** 10_Wiki/๐Ÿ’ก Topics/AI - **Related:** Chain-of-Thought, Self-Consistency, Reward-Modeling - **Raw Source:** 00_Raw/2026-04-20/Best-of-N Sampling.md