--- id: SYS-HA-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [infrastructure, high-availability, cloud-computing, reliability, system-architecture] last_reinforced: 2026-04-26 --- # High Availability Systems (κ³ κ°€μš©μ„± μ‹œμŠ€ν…œ) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μž₯μ• λŠ” λ°˜λ“œμ‹œ μΌμ–΄λ‚œλ‹€λŠ” κ°€μ • ν•˜μ—, μ‹œμŠ€ν…œμ˜ μ–΄λŠ ν•œ 곳이 λ¬΄λ„ˆμ Έλ„ μ„œλΉ„μŠ€κ°€ λ©ˆμΆ”μ§€ μ•ŠλŠ” 'λΆˆμ‚¬μ‹ ' μ•„ν‚€ν…μ²˜λ₯Ό μ„€κ³„ν•˜λΌ" β€” μ‹œμŠ€ν…œμ˜ ꡬ성 μš”μ†Œλ“€μ„ 이쀑화(Redundancy)ν•˜κ³  μž₯μ• λ₯Ό μžλ™μœΌλ‘œ 감지 및 λ³΅κ΅¬ν•˜μ—¬, μ‚¬μš©μžκ°€ μ²΄κ°ν•˜λŠ” μ„œλΉ„μŠ€ 쀑단 μ‹œκ°„μ„ μ΅œμ†Œν™”(99.99% 이상)ν•˜λŠ” 기술 체계. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** 단일 μž₯애점(Single Point of Failure, SPOF)을 μ² μ €νžˆ μ œκ±°ν•˜κ³ , λΆ€ν•˜ λΆ„μ‚°(Load Balancing)κ³Ό 볡제(Replication)λ₯Ό 톡해 μžμ›μ„ λΆ„μ‚° λ°°μΉ˜ν•˜μ—¬ μž₯μ•  μ „νŒŒλ₯Ό μ°¨λ‹¨ν•˜λŠ” 격리 및 볡ꡬ νŒ¨ν„΄. - **핡심 μš”μ†Œ:** - **Redundancy:** λͺ¨λ“  핡심 μ„œλ²„μ™€ λ°μ΄ν„°λ² μ΄μŠ€λ₯Ό 2개 μ΄μƒμœΌλ‘œ 운영 (Active-Active, Active-Standby). - **Load Balancing:** νŠΈλž˜ν”½μ„ μ—¬λŸ¬ λ…Έλ“œμ— κ³ λ₯΄κ²Œ λΆ„μ‚°μ‹œμΌœ κ³ΌλΆ€ν•˜ λ°©μ§€. - **Failover:** μž₯μ•  λ°œμƒ μ‹œ μ¦‰μ‹œ 정상적인 λ…Έλ“œλ‘œ μ„œλΉ„μŠ€λ₯Ό μ „ν™˜. - **Health Check:** 각 λ…Έλ“œμ˜ μƒνƒœλ₯Ό 주기적으둜 κ°μ§€ν•˜μ—¬ κ°€μš© μžμ›μ—μ„œ μ œμ™Έ/포함 κ²°μ •. - **의의:** λΉ„μ¦ˆλ‹ˆμŠ€ 연속성(Business Continuity)을 보μž₯ν•˜κ³  μ„œλΉ„μŠ€ 신뒰도λ₯Ό λ†’μ—¬, λŒ€κ·œλͺ¨ μ‚¬μš©μž 기반의 ν”Œλž«νΌ μš΄μ˜μ— ν•„μˆ˜μ μΈ ν† λŒ€ 제곡. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœνžˆ μ„±λŠ₯이 쒋은 μ„œλ²„ ν•˜λ‚˜λ₯Ό μ“°λŠ” 것보닀, ν‰λ²”ν•œ μ—¬λŸ¬ μ„œλ²„λ₯Ό 유기적으둜 μ—°κ²°ν•˜λŠ” 것이 λΉ„μš© λŒ€λΉ„ κ°€μš©μ„± λ©΄μ—μ„œ μ••λ„μ μž„μ„ ν΄λΌμš°λ“œ μ‹œλŒ€λ₯Ό 톡해 증λͺ…. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈμ˜ ν΄λΌμš°λ“œ 브레인 μΈν”„λΌλŠ” 닀쀑 리전(Multi-region) 배치λ₯Ό 톡해 μžμ—°μž¬ν•΄κΈ‰ μž₯μ•  μƒν™©μ—μ„œλ„ 지식 검색 μ„œλΉ„μŠ€κ°€ μ€‘λ‹¨λ˜μ§€ μ•Šλ„λ‘ κ³ κ°€μš©μ„± 섀계λ₯Ό μ μš©ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - System-Design-for-AI-Scale, [[Distributed-Computing]], [[Hybrid-Cloud-Architectures]], Fault-Tolerance-and-Resilience - **Raw Source:** 10_Wiki/Topics/AI/High-Availability-Systems.md