--- id: SYS-SCALE-AI-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, infrastructure, scalability, distributed-systems, load-balancing, microservices, mlops] last_reinforced: 2026-04-26 --- # Scalability in AI Systems (AI μ‹œμŠ€ν…œμ˜ ν™•μž₯μ„±) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "ν­μ¦ν•˜λŠ” νŠΈλž˜ν”½κ³Ό 데이터 μ•žμ— μ‹œμŠ€ν…œμ΄ λ¬΄λ„ˆμ§€μ§€ μ•Šλ„λ‘, μ„ ν˜•μ  ν™•μž₯(Scaling)이 κ°€λŠ₯ν•œ λͺ¨λ“ˆν˜• μ•„ν‚€ν…μ²˜λ₯Ό κ΅¬μΆ•ν•˜κ³  병λͺ©μ„ μ„ μ œμ μœΌλ‘œ ν•΄μ²΄ν•˜λΌ" β€” μ‚¬μš©μž μˆ˜λ‚˜ 데이터 규λͺ¨κ°€ 컀져도 μ„±λŠ₯ μ €ν•˜ 없이 μžμ›μ„ μΆ”κ°€ν•˜μ—¬ λŒ€μ‘ν•  수 μžˆλŠ” AI μΈν”„λΌμ˜ λŠ₯λ ₯. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Horizontal Elasticity and Resource Decoupling" β€” μ„œλ²„ ν•œ λŒ€μ˜ μ„±λŠ₯을 λ†’μ΄λŠ” λŒ€μ‹ (Vertical), μ—¬λŸ¬ λŒ€μ˜ μ €λ ΄ν•œ μ„œλ²„λ₯Ό λ³‘λ ¬λ‘œ μ—°κ²°ν•˜κ³ (Horizontal), μ—°μ‚°(GPU)κ³Ό μ €μž₯(DB)을 λΆ„λ¦¬ν•˜μ—¬ λΆ€ν•˜μ— 따라 μœ μ—°ν•˜κ²Œ 늘리고 μ€„μ΄λŠ” νŒ¨ν„΄. - **핡심 ν™•μž₯ μ „λž΅:** - **Load Balancing:** νŠΈλž˜ν”½μ„ μ—¬λŸ¬ μΆ”λ‘  μ„œλ²„λ‘œ κ· λ“±ν•˜κ²Œ λΆ„μ‚°. - **Model Parallelism:** κ±°λŒ€ λͺ¨λΈμ„ μ—¬λŸ¬ GPU에 λ‚˜λˆ„μ–΄ 적재. - **Asynchronous Processing:** 무거운 μž‘μ—…μ€ 큐(Queue)λ₯Ό 톡해 λΉ„λ™κΈ°λ‘œ 처리. - **Microservices:** κΈ°λŠ₯을 μͺΌκ°œμ–΄ λ…λ¦½μ μœΌλ‘œ ν™•μž₯ κ°€λŠ₯ν•˜κ²Œ 섀계. - **의의:** μ‹€ν—˜μ‹€ μˆ˜μ€€μ˜ AI λͺ¨λΈμ΄ μˆ˜μ–΅ λͺ…이 μ‚¬μš©ν•˜λŠ” λŒ€κ·œλͺ¨ μƒμš© μ„œλΉ„μŠ€(예: ChatGPT)둜 κ±°λ“­λ‚˜κΈ° μœ„ν•œ ν•„μˆ˜μ μΈ 곡학적 ν† λŒ€. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** 무쑰건 μžμ›μ„ 많이 νˆ¬μž…ν•˜λŠ” 것이 λ‹΅μ΄λΌλ˜ μ‹œλŒ€λ₯Ό μ§€λ‚˜, μ΄μ œλŠ” μ„œλ²„λ¦¬μŠ€(Serverless) μΆ”λ‘ μ΄λ‚˜ μ§€λŠ₯ν˜• μžλ™ ν™•μž₯(Auto-scaling)을 톡해 λΉ„μš© 효율과 ν™•μž₯성을 λ™μ‹œμ— μž‘λŠ” 'κ·Έλ¦° AI' 인프라가 μ£Όλͺ©λ°›κ³  있음. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” μ—μ΄μ „νŠΈμ˜ λ™μ‹œ μ ‘μ†μž 수 증가에 λŒ€λΉ„ν•˜μ—¬, 도컀(Docker)와 μΏ λ²„λ„€ν‹°μŠ€(Kubernetes) 기반의 μ»¨ν…Œμ΄λ„ˆ ν™˜κ²½μ—μ„œ μœ μ—°ν•˜κ²Œ ν™•μž₯ κ°€λŠ₯ν•œ λ§ˆμ΄ν¬λ‘œμ„œλΉ„μŠ€ ꡬ쑰λ₯Ό κΈ°λ³Έ 채택함. ## πŸ”— 지식 μ—°κ²° (Graph) - System-Design-for-AI-Scale, [[High-Availability-Systems|High-Availability-Systems]], [[Parallel-Computing-in-AI|Parallel-Computing-in-AI]], Cloud-Computing-Foundations - **Raw Source:** 10_Wiki/Topics/AI/Scalability-in-AI-Systems.md