"매 component 가 죽으면 매 system 전체가 죽는 의 단일 의존점". 매 reliability engineering 의 가장 기본 anti-pattern — 매 redundancy + replication + failover 로 제거. 매 2020s cloud era 에서도 매 BGP misconfig (Facebook 2021), Cloudflare control plane (2023), AWS us-east-1 (2024 repeats) 가 매 region/provider-level SPOF 의 dramatic 증명.
매 핵심
매 Layers of SPOF
Hardware: single PSU, single NIC, single rack, single AZ.
Network: single ISP, single BGP route, single DNS provider.
Software: leader without standby, single DB primary, single secret store.
Human: bus-factor 1 — only one person knows the system.
Vendor: single cloud, single CDN, single auth provider.
# Both providers serve same zone — survive provider outage (e.g. Dyn 2016)PROVIDERS=["ns1.p01.dynect.net","ns-2048.awsdns-64.com"]# Register both NS records at registrar; clients auto-fallback
Chaos test for SPOF discovery
# Chaos Mesh: kill random node, observe SLOapiVersion:chaos-mesh.org/v1alpha1kind:PodChaosspec:action:pod-failuremode:oneduration:"60s"selector:namespaces:[prod]scheduler:cron:"@every 1h"
CRDT for leaderless replication (Yjs)
import*asYfrom'yjs'import{WebsocketProvider}from'y-websocket'constdoc=newY.Doc()// Multiple providers — no single broker SPOF
newWebsocketProvider('wss://ws1.app','room',doc)newWebsocketProvider('wss://ws2.app','room',doc)constmap=doc.getMap('state')
언제: architecture review for SPOF spotting, postmortem analysis, runbook generation, dependency graph summarization.
언제 X: real-time failover decisions — use deterministic health checks and orchestrators.