Files
2nd/10_Wiki/Topics/AI_and_ML/Distributed-Systems.md
T
2026-05-10 22:08:15 +09:00

9.7 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-distributed-systems Distributed Systems 10_Wiki/Topics verified self
distributed systems
microservices
consensus
raft
paxos
sharding
replication
none A 0.95 applied
distributed-systems
scalability
microservices
consensus
replication
sharding
fault-tolerance
2026-05-10 pending
language framework
distributed systems K8s / Kafka / Cassandra / Redis / etcd

Distributed Systems

매 한 줄

"매 N machine 의 1 system 의 appearance". 매 fault tolerance + 매 scale + 매 latency 의 trade-off. 매 CAP / PACELC 의 fundamental. 매 modern: 매 K8s + 매 service mesh + 매 eventual consistency 의 default. 매 edge / multi-region 의 trend.

매 핵심 challenges

8 fallacies of distributed computing (Deutsch / Gosling)

  1. 매 network 의 reliable.
  2. 매 latency 의 zero.
  3. 매 bandwidth 의 infinite.
  4. 매 network 의 secure.
  5. 매 topology 의 unchanged.
  6. 매 1 admin.
  7. 매 transport cost 의 zero.
  8. 매 network 의 homogeneous.

→ 매 모두 의 false.

CAP / PACELC

매 핵심 problem

  • Consistency: 매 다른 node 의 같은 view?
  • Coordination: 매 leader / consensus.
  • Failure: 매 partial failure.
  • Time: 매 clock skew (Lamport, vector clock).
  • Network: 매 partition.

매 핵심 patterns

Replication

  • 매 same data 의 multiple node.
  • 매 sync vs async.
  • 매 leader-follower vs multi-leader.

Sharding / Partitioning

  • 매 data 의 N piece 의 split.
  • 매 hash / range / geographic.

Consensus

  • Raft (modern, simpler): 매 etcd, Consul, CockroachDB.
  • Paxos: 매 classic.
  • Multi-Paxos / EPaxos.
  • PBFT (Byzantine).

Eventual consistency

  • 매 some time 매 converge.
  • 매 CRDT 의 conflict-free.

매 service patterns

  • API gateway.
  • Service mesh (Istio, Linkerd).
  • Sidecar.
  • Circuit breaker.
  • Bulkhead.
  • Saga (distributed transaction).
  • Outbox (reliable messaging).

매 messaging

  • Kafka: 매 high-throughput log.
  • RabbitMQ: 매 traditional queue.
  • NATS: 매 simple, fast.
  • Pulsar: 매 modern Kafka alternative.
  • Redis Streams.

매 observability

  • 매 distributed tracing (OpenTelemetry).
  • 매 structured logs.
  • 매 metrics.
  • 매 chaos engineering.

매 응용

  1. Web app at scale.
  2. Cloud database (Spanner, CockroachDB).
  3. ML training (data + model parallel).
  4. Blockchain (BFT + permissionless).
  5. Edge computing.
  6. CDN.

💻 패턴

Raft (etcd / consul)

# 매 simplified
class RaftNode:
    def __init__(self):
        self.state = 'follower'
        self.term = 0
        self.voted_for = None
        self.log = []
        self.commit_index = 0
    
    def request_vote(self, term, candidate_id, last_log_index, last_log_term):
        if term > self.term:
            self.term = term
            self.state = 'follower'
        if self.voted_for in (None, candidate_id) and self.is_log_up_to_date(last_log_index, last_log_term):
            self.voted_for = candidate_id
            return True
        return False
    
    def append_entries(self, term, leader_id, entries):
        if term < self.term: return False
        self.log.extend(entries)
        return True

Sharding (consistent hashing)

import hashlib
from sortedcontainers import SortedList

class ConsistentHash:
    def __init__(self, nodes, virtual_nodes=150):
        self.ring = SortedList()
        self.node_map = {}
        for node in nodes:
            for i in range(virtual_nodes):
                key = self._hash(f'{node}#{i}')
                self.ring.add(key)
                self.node_map[key] = node
    
    def _hash(self, s):
        return int(hashlib.md5(s.encode()).hexdigest(), 16)
    
    def get_node(self, key):
        if not self.ring: return None
        h = self._hash(key)
        idx = self.ring.bisect_right(h)
        if idx == len(self.ring): idx = 0
        return self.node_map[self.ring[idx]]

Saga pattern (distributed transaction)

class OrderSaga:
    """매 매 step + 매 compensating action."""
    
    async def execute(self, order):
        completed = []
        try:
            await self.reserve_inventory(order); completed.append('inventory')
            await self.charge_payment(order); completed.append('payment')
            await self.create_shipment(order); completed.append('shipment')
            return 'success'
        except Exception as e:
            # 매 compensate in reverse
            for step in reversed(completed):
                await getattr(self, f'undo_{step}')(order)
            return f'failed: {e}'

Outbox pattern (reliable messaging)

-- 매 매 transaction 의 outbox row 도 insert
BEGIN;
INSERT INTO orders (...) VALUES (...);
INSERT INTO outbox (event_type, payload, status) 
VALUES ('OrderCreated', '{...}', 'pending');
COMMIT;

-- 매 separate worker
SELECT * FROM outbox WHERE status = 'pending' LIMIT 100 FOR UPDATE SKIP LOCKED;
-- publish to Kafka
UPDATE outbox SET status = 'published' WHERE id = $1;

Circuit breaker

class CircuitBreaker {
  state: 'closed' | 'open' | 'half-open' = 'closed';
  failures = 0;
  lastFailure = 0;
  
  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > 30_000) this.state = 'half-open';
      else throw new ServiceUnavailable();
    }
    try {
      const r = await fn();
      this.state = 'closed';
      this.failures = 0;
      return r;
    } catch (e) {
      this.failures++;
      this.lastFailure = Date.now();
      if (this.failures >= 5) this.state = 'open';
      throw e;
    }
  }
}

Vector clock (causal ordering)

class VectorClock:
    def __init__(self, node_id, n_nodes):
        self.node_id = node_id
        self.clock = [0] * n_nodes
    
    def tick(self):
        self.clock[self.node_id] += 1
    
    def update(self, other_clock):
        self.clock = [max(a, b) for a, b in zip(self.clock, other_clock)]
        self.tick()
    
    def happens_before(self, other):
        return all(a <= b for a, b in zip(self.clock, other.clock)) and \
               any(a < b for a, b in zip(self.clock, other.clock))

CRDT (G-Counter)

class GCounter:
    def __init__(self, node_id):
        self.node_id = node_id
        self.counts = {}
    
    def increment(self):
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1
    
    def value(self):
        return sum(self.counts.values())
    
    def merge(self, other):
        for nid, cnt in other.counts.items():
            self.counts[nid] = max(self.counts.get(nid, 0), cnt)

Service mesh (Istio sidecar)

# 매 매 Pod 의 Envoy sidecar
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata: { name: orders }
spec:
  hosts: [orders]
  http:
  - route:
    - destination: { host: orders, subset: v1, weight: 90 }
    - destination: { host: orders, subset: v2, weight: 10 }
    fault:
      delay: { percentage: { value: 0.1 }, fixedDelay: 5s }  # 매 chaos

Distributed tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@tracer.start_as_current_span('process_order')
def process(order):
    with tracer.start_as_current_span('validate'):
        validate(order)
    with tracer.start_as_current_span('charge'):
        charge(order)
    with tracer.start_as_current_span('ship'):
        ship(order)

Chaos engineering

# 매 Chaos Monkey 식
import random

class ChaosMonkey:
    def maybe_kill(self, instance, p=0.001):
        if random.random() < p:
            log(f'CHAOS: killing {instance.id}')
            instance.terminate()

매 결정 기준

상황 Pattern
Strong consistency Raft / Paxos (etcd, CockroachDB)
High availability Eventual + CRDT (Cassandra, DynamoDB)
Distributed transaction Saga + Outbox
Service-to-service Service mesh
High-throughput msg Kafka
Real-time low-latency NATS / Redis
Multi-region read CDN / Edge cache
Cross-region write Spanner / FoundationDB

기본값: 매 K8s + service mesh + Raft for state + Kafka for events + tracing.

🔗 Graph

🤖 LLM 활용

언제: 매 system design. 매 scalability planning. 매 reliability engineering. 매 multi-region. 언제 X: 매 single-machine app. 매 prototype.

안티패턴

  • 8 fallacies 의 ignore.
  • Distributed monolith (sync chain).
  • Synchronous everything (no event-driven).
  • No idempotency (retry corruption).
  • No observability.
  • Premature microservices.
  • No circuit breaker (cascade fail).

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — 8 fallacies + patterns + 매 Raft / sharding / Saga / Outbox / circuit breaker / CRDT code