--- id: P-REINFORCE-AI-TEST-TIME-COMPUTE category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 0.97 tags: [LLM, Inference, Scale, OpenAI-o1] last_reinforced: 2026-04-20 --- # [[Test-Time Compute Scaling (แ„Žแ…ฎแ„…แ…ฉแ†ซ แ„‰แ…ตแ„€แ…กแ†ซ แ„€แ…จแ„‰แ…กแ†ซ แ„‰แ…ณแ„แ…ฆแ„‹แ…ตแ†ฏแ„…แ…ตแ†ผ)|Test-Time Compute Scaling (์ถ”๋ก  ์‹œ๊ฐ„ ๊ณ„์‚ฐ ์Šค์ผ€์ผ๋ง)]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๋ชจ๋ธ์ด ํฌ์ง€ ์•Š์•„๋„, ๋” ์˜ค๋ž˜ ์ƒ๊ฐํ•˜๊ฒŒ ํ•˜๋ฉด ๋” ๋˜‘๋˜‘ํ•ด์ง„๋‹ค." ํ›ˆ๋ จ ๋‹จ๊ณ„์˜ ์Šค์ผ€์ผ๋ง์„ ๋„˜์–ด, ์ถ”๋ก (Inference) ์‹œ์— ๋” ๋งŽ์€ ์—ฐ์‚ฐ ์ž์›(์‚ฌ๊ณ  ๋‹จ๊ณ„)์„ ํˆฌ์ž…ํ•˜์—ฌ ์ •๋‹ต๋ฅ ์„ ๋†’์ด๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์ด๋‹ค. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **The Concept**: - ๊ธฐ์กด์—๋Š” ๋ชจ๋ธ์˜ ํฌ๊ธฐ(ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜)๊ฐ€ ์ง€๋Šฅ์„ ๊ฒฐ์ •ํ•œ๋‹ค๊ณ  ๋ฏฟ์—ˆ์œผ๋‚˜, OpenAI o1 ๋“ฑ ์ตœ์‹  ๋ชจ๋ธ์€ ๋‹ต๋ณ€ ์ „ 'Self-Correction'๊ณผ ์ถ”๋ก  ๊ณผ์ •์„ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ๊ฑฐ๋Œ€ ๋ชจ๋ธ์„ ์••๋„ํ•  ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•จ. - **Methods**: - **Chain-of-Thought (CoT)**: ์ค‘๊ฐ„ ๊ณผ์ •์„ ๊ธธ๊ฒŒ ์ƒ์„ฑ. - **Search (MCTS)**: ์—ฌ๋Ÿฌ ๋Œ€์•ˆ ๋‹ต๋ณ€์„ ํƒ์ƒ‰ํ•˜๊ณ  ํ‰๊ฐ€ํ•˜์—ฌ ์ตœ์ ์˜ ๊ฒฝ๋กœ ์„ ํƒ. - **Verification**: ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฅผ ์Šค์Šค๋กœ ๊ฒ€์ฆํ•˜๊ณ  ํ‹€๋ ธ์œผ๋ฉด ๋‹ค์‹œ ์‹œ๋„. - **Inference Law**: ํ›ˆ๋ จ ์‹œ ์ž์›์ด ๋ถ€์กฑํ•ด๋„ ์ถ”๋ก  ์‹œ ๊ณ„์‚ฐ๋Ÿ‰์„ ๋Š˜๋ฆผ์œผ๋กœ์จ ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ๋ŒํŒŒํ•  ์ˆ˜ ์žˆ๋‹ค. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) - ์ถ”๋ก  ์‹œ๊ฐ„ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋Š˜์–ด๋‚˜๋ฉด ๋น„์šฉ(Latency)์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•œ๋‹ค. ์‹ค์‹œ๊ฐ„ ์ฑ„ํŒ…์—๋Š” ๋ถ€์ ํ•ฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, '๋น ๋ฅธ ์ง๊ด€(System 1)'๊ณผ '์‹ ์ค‘ํ•œ ์‚ฌ๊ณ (System 2)'๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ ๊ณผ์ œ ๋‚œ์ด๋„์— ๋”ฐ๋ผ ์ž์›์„ ๋ฐฐ๋ถ„ํ•˜๋Š” ํšจ์œจํ™”๊ฐ€ ํ•ต์‹ฌ ๊ณผ์ œ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Related: [[Chain-of-Thought (CoT แ„‰แ…กแ„€แ…ฉ แ„‰แ…กแ„‰แ…ณแ†ฏ)|Chain-of-Thought (CoT ์‚ฌ๊ณ  ์‚ฌ์Šฌ)]] , Monte Carlo Tree Search (MCTS) - Origin: OpenAI-o1