--- id: wiki-2026-0508-distributed-systems title: Distributed Systems category: 10_Wiki/Topics status: verified canonical_id: self aliases: [distributed systems, microservices, consensus, raft, paxos, sharding, replication] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [distributed-systems, scalability, microservices, consensus, replication, sharding, fault-tolerance] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: distributed systems framework: K8s / Kafka / Cassandra / Redis / etcd --- # Distributed Systems ## 매 한 줄 > **"매 N machine 의 1 system 의 appearance"**. 매 fault tolerance + 매 scale + 매 latency 의 trade-off. 매 CAP / PACELC 의 fundamental. 매 modern: 매 K8s + 매 service mesh + 매 eventual consistency 의 default. 매 edge / multi-region 의 trend. ## 매 핵심 challenges ### 8 fallacies of distributed computing (Deutsch / Gosling) 1. 매 network 의 reliable. 2. 매 latency 의 zero. 3. 매 bandwidth 의 infinite. 4. 매 network 의 secure. 5. 매 topology 의 unchanged. 6. 매 1 admin. 7. 매 transport cost 의 zero. 8. 매 network 의 homogeneous. → 매 모두 의 false. ### CAP / PACELC - 매 [[CAP-Theorem]] 참조. ### 매 핵심 problem - **Consistency**: 매 다른 node 의 같은 view? - **Coordination**: 매 leader / consensus. - **Failure**: 매 partial failure. - **Time**: 매 clock skew (Lamport, vector clock). - **Network**: 매 partition. ## 매 핵심 patterns ### Replication - 매 same data 의 multiple node. - 매 sync vs async. - 매 leader-follower vs multi-leader. ### Sharding / Partitioning - 매 data 의 N piece 의 split. - 매 hash / range / geographic. ### Consensus - **Raft** (modern, simpler): 매 etcd, Consul, CockroachDB. - **Paxos**: 매 classic. - **Multi-Paxos / EPaxos**. - **PBFT** (Byzantine). ### Eventual consistency - 매 some time 매 converge. - 매 CRDT 의 conflict-free. ### 매 service patterns - **API gateway**. - **Service mesh** (Istio, Linkerd). - **Sidecar**. - **Circuit breaker**. - **Bulkhead**. - **Saga** (distributed transaction). - **Outbox** (reliable messaging). ### 매 messaging - **Kafka**: 매 high-throughput log. - **RabbitMQ**: 매 traditional queue. - **NATS**: 매 simple, fast. - **Pulsar**: 매 modern Kafka alternative. - **Redis Streams**. ### 매 observability - 매 distributed tracing (OpenTelemetry). - 매 structured logs. - 매 metrics. - 매 chaos engineering. ### 매 응용 1. **Web app at scale**. 2. **Cloud database** (Spanner, CockroachDB). 3. **ML training** (data + model parallel). 4. **Blockchain** (BFT + permissionless). 5. **Edge computing**. 6. **CDN**. ## 💻 패턴 ### Raft (etcd / consul) ```python # 매 simplified class RaftNode: def __init__(self): self.state = 'follower' self.term = 0 self.voted_for = None self.log = [] self.commit_index = 0 def request_vote(self, term, candidate_id, last_log_index, last_log_term): if term > self.term: self.term = term self.state = 'follower' if self.voted_for in (None, candidate_id) and self.is_log_up_to_date(last_log_index, last_log_term): self.voted_for = candidate_id return True return False def append_entries(self, term, leader_id, entries): if term < self.term: return False self.log.extend(entries) return True ``` ### Sharding (consistent hashing) ```python import hashlib from sortedcontainers import SortedList class ConsistentHash: def __init__(self, nodes, virtual_nodes=150): self.ring = SortedList() self.node_map = {} for node in nodes: for i in range(virtual_nodes): key = self._hash(f'{node}#{i}') self.ring.add(key) self.node_map[key] = node def _hash(self, s): return int(hashlib.md5(s.encode()).hexdigest(), 16) def get_node(self, key): if not self.ring: return None h = self._hash(key) idx = self.ring.bisect_right(h) if idx == len(self.ring): idx = 0 return self.node_map[self.ring[idx]] ``` ### Saga pattern (distributed transaction) ```python class OrderSaga: """매 매 step + 매 compensating action.""" async def execute(self, order): completed = [] try: await self.reserve_inventory(order); completed.append('inventory') await self.charge_payment(order); completed.append('payment') await self.create_shipment(order); completed.append('shipment') return 'success' except Exception as e: # 매 compensate in reverse for step in reversed(completed): await getattr(self, f'undo_{step}')(order) return f'failed: {e}' ``` ### Outbox pattern (reliable messaging) ```sql -- 매 매 transaction 의 outbox row 도 insert BEGIN; INSERT INTO orders (...) VALUES (...); INSERT INTO outbox (event_type, payload, status) VALUES ('OrderCreated', '{...}', 'pending'); COMMIT; -- 매 separate worker SELECT * FROM outbox WHERE status = 'pending' LIMIT 100 FOR UPDATE SKIP LOCKED; -- publish to Kafka UPDATE outbox SET status = 'published' WHERE id = $1; ``` ### Circuit breaker ```ts class CircuitBreaker { state: 'closed' | 'open' | 'half-open' = 'closed'; failures = 0; lastFailure = 0; async call(fn: () => Promise): Promise { if (this.state === 'open') { if (Date.now() - this.lastFailure > 30_000) this.state = 'half-open'; else throw new ServiceUnavailable(); } try { const r = await fn(); this.state = 'closed'; this.failures = 0; return r; } catch (e) { this.failures++; this.lastFailure = Date.now(); if (this.failures >= 5) this.state = 'open'; throw e; } } } ``` ### Vector clock (causal ordering) ```python class VectorClock: def __init__(self, node_id, n_nodes): self.node_id = node_id self.clock = [0] * n_nodes def tick(self): self.clock[self.node_id] += 1 def update(self, other_clock): self.clock = [max(a, b) for a, b in zip(self.clock, other_clock)] self.tick() def happens_before(self, other): return all(a <= b for a, b in zip(self.clock, other.clock)) and \ any(a < b for a, b in zip(self.clock, other.clock)) ``` ### CRDT (G-Counter) ```python class GCounter: def __init__(self, node_id): self.node_id = node_id self.counts = {} def increment(self): self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1 def value(self): return sum(self.counts.values()) def merge(self, other): for nid, cnt in other.counts.items(): self.counts[nid] = max(self.counts.get(nid, 0), cnt) ``` ### Service mesh (Istio sidecar) ```yaml # 매 매 Pod 의 Envoy sidecar apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: { name: orders } spec: hosts: [orders] http: - route: - destination: { host: orders, subset: v1, weight: 90 } - destination: { host: orders, subset: v2, weight: 10 } fault: delay: { percentage: { value: 0.1 }, fixedDelay: 5s } # 매 chaos ``` ### Distributed tracing ```python from opentelemetry import trace tracer = trace.get_tracer(__name__) @tracer.start_as_current_span('process_order') def process(order): with tracer.start_as_current_span('validate'): validate(order) with tracer.start_as_current_span('charge'): charge(order) with tracer.start_as_current_span('ship'): ship(order) ``` ### Chaos engineering ```python # 매 Chaos Monkey 식 import random class ChaosMonkey: def maybe_kill(self, instance, p=0.001): if random.random() < p: log(f'CHAOS: killing {instance.id}') instance.terminate() ``` ## 매 결정 기준 | 상황 | Pattern | |---|---| | Strong consistency | Raft / Paxos (etcd, CockroachDB) | | High availability | Eventual + CRDT (Cassandra, DynamoDB) | | Distributed transaction | Saga + Outbox | | Service-to-service | Service mesh | | High-throughput msg | Kafka | | Real-time low-latency | NATS / Redis | | Multi-region read | CDN / Edge cache | | Cross-region write | Spanner / FoundationDB | **기본값**: 매 K8s + service mesh + Raft for state + Kafka for events + tracing. ## 🔗 Graph - 부모: [[Software-Architecture]] · [[System-Design]] - 변형: [[CAP-Theorem]] · [[PACELC]] · [[Microservices]] · [[Service Mesh]] · [[CRDT]] - 응용: [[Raft]] · [[Paxos]] · [[Saga]] · [[Outbox]] · [[Circuit-Breaker]] - Tools: [[Kafka]] · [[Cassandra]] · [[etcd]] · [[Spanner]] · [[Kubernetes]] - Adjacent: [[Availability-and-Persistence]] · [[Software Architecture Styles]] · [[Bottlenecks]] · [[Antifragility]] ## 🤖 LLM 활용 **언제**: 매 system design. 매 scalability planning. 매 reliability engineering. 매 multi-region. **언제 X**: 매 single-machine app. 매 prototype. ## ❌ 안티패턴 - **8 fallacies 의 ignore**. - **Distributed monolith** (sync chain). - **Synchronous everything** (no event-driven). - **No idempotency** (retry corruption). - **No observability**. - **Premature microservices**. - **No circuit breaker** (cascade fail). ## 🧪 검증 / 중복 - Verified (Kleppmann "DDIA", Raft paper, Paxos paper, Google papers). - 신뢰도 A. - Related: [[CAP-Theorem]] · [[Availability-and-Persistence]] · [[Software Architecture Styles]] · [[Bottlenecks]] · [[Bounded Contexts (DDD)]] · [[Antifragility]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — 8 fallacies + patterns + 매 Raft / sharding / Saga / Outbox / circuit breaker / CRDT code |