---
id: cs-quorum-consensus
title: Quorum / Consensus — Paxos / Raft / Dynamo style
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [cs, consensus, distributed, vibe-coding]
tech_stack: { language: "TS / Go", applicable_to: ["Backend", "CS"] }
applied_in: []
aliases: [quorum, consensus, Paxos, Raft, R+W>N, Dynamo, distributed agreement]
---

# Quorum / Consensus

> 분산 system 의 핵심 algorithm. **Paxos (foundational), Raft (modern), Dynamo (eventually consistent)**. Quorum = N/2+1 동의.

## 📖 핵심 개념
- N: total replica 수.
- W: write quorum (W replica 가 commit).
- R: read quorum.
- R + W > N → strong consistency.

## 💻 코드 패턴

### Quorum 식 (R + W > N)
```
N = 5 (5 replica).

Strong:
- W = 3, R = 3 → R + W = 6 > 5. 강 consistency.
- 2 down OK (3 가 quorum).

Eventual:
- W = 1, R = 1 → 매 read 가 이전 write 의 update 못 봄.
- 가장 빠름.

균형:
- W = 3, R = 1 (read-heavy).
- W = 1, R = 3 (write-heavy).
```

### Dynamo (Cassandra 식)
```ts
// Coordinator 가 5 replica 에 write 보냄
// W=3 ack 가져오면 success.

async function write(key, value) {
  const responses = await Promise.allSettled(
    replicas.map(r => r.write(key, value))
  );
  
  const success = responses.filter(r => r.status === 'fulfilled').length;
  if (success < W) throw new Error('write failed');
  
  // 다른 replica 도 background 로 catch up.
}
```

### Read repair
```ts
async function read(key) {
  const responses = await Promise.allSettled(
    replicas.slice(0, R).map(r => r.read(key))
  );
  
  // 매 replica 의 version 비교
  const values = responses.map(r => r.value);
  const latest = mostRecent(values);   // by vector clock / timestamp
  
  // 옛 replica 도 update (read repair)
  for (const [i, v] of values.entries()) {
    if (v !== latest) replicas[i].write(key, latest, { background: true });
  }
  
  return latest;
}
```

### Hinted handoff
```
W=3 의 write.
1 replica 가 down.
Coordinator 가 다른 node 가 임시 store ("hint").
Down replica 가 up = hint 가 transfer.

→ Availability ↑.
```

### Paxos (foundational)
```
Phase 1 (Prepare):
- Proposer 가 number N 으로 prepare.
- Acceptor 가 N > 자기 가장 큰 = OK.

Phase 2 (Accept):
- Proposer 가 value 보냄.
- Acceptor 가 majority OK = accepted.

Phase 3 (Learn):
- Learner 가 value 알아.

→ 매 round 가 1 value. 복잡.
```

→ Multi-Paxos 가 series.

### Raft (modern, simple)
```
3 role:
- Leader: write 받음, follower 에 replicate.
- Follower: leader 의 entry append.
- Candidate: leader election.

매 N follower 가 ack = committed (W = N/2 + 1).
```

### Raft election
```
Leader 가 heartbeat 안 보내면:
1. Follower 가 timeout (random 150-300 ms).
2. Candidate 가 됨.
3. RequestVote 가 다른 node.
4. Majority 가 vote = leader.

Term: 매 election 의 number.
```

### Raft log replication
```
Client → Leader → AppendEntries(log) → Followers.
Followers ack → Leader 가 commit.
Leader → 다음 heartbeat 가 commit index.
Followers 가 apply.
```

→ 모든 node 가 같은 sequence.

### Raft 의 implementation
```
- etcd (CoreOS / Kubernetes)
- Consul (HashiCorp)
- TiKV / CockroachDB / YugabyteDB
- RAFTKE (Rust)
- nuraft (C++)
```

### When use?
```
- Distributed lock (etcd)
- Service discovery (Consul)
- Distributed DB (CockroachDB, TiDB)
- Configuration store (ZooKeeper, etcd)
```

### vs Paxos
```
Paxos: 가장 first, complex.
Raft: equivalent, easier to understand.

→ Modern = Raft.
```

### Byzantine fault tolerance (BFT)
```
정상 fault: node 가 crash.
Byzantine: node 가 lie (악성).

→ Paxos / Raft 가 안 다룸 (crash-only).
PBFT, Tendermint 가 BFT.
Blockchain 가 BFT (보통).
```

### CAP theorem
```
Consistency vs Availability vs Partition tolerance.
2 만 (3 다 안 됨).

CP: Consistency + Partition (예: HBase, MongoDB).
AP: Availability + Partition (예: Cassandra, DynamoDB).
CA: 안 됨 (network 가 partition 됨).
```

### Network partition 시
```
CP: minority partition 가 reject (read/write 안 됨).
AP: 양쪽 partition 가 read/write OK. 이후 reconcile (CRDT 등).
```

→ Real world = AP / CP 의 mix.

### Strong vs eventual
```
Strong: 매 read 가 이전 write 봄.
Linearizable: 시간 순서 보존.
Sequential: 매 process 의 순서 보존.
Causal: causality 보존.
Eventual: 결국 같음 (no time bound).
```

### Read-your-write
```
사용자 가 자기 write 후 immediate read = visible.

구현:
- Sticky session (같은 replica).
- Write ack 후 cache.
- Eventually consistent + 사용자 별 latest 추적.
```

### Quorum 의 함정
```
- W = N (all replica): 1 down = write fail. Brittle.
- W = 1: read 가 stale.
- W = R = 1, N = 3: 가장 fast, weakest.
- W = 3, R = 3, N = 5: 강 + 2 down OK.
```

### Network partition 의 실제
```
Split-brain: 두 partition 가 각자 leader.
- Raft 가 막음 (term + majority).
- Manual recovery 가 필요할 때 있음.
```

→ Consul / etcd 의 production tip = 5-7 node, odd count.

### Single leader vs leaderless
```
Single leader (Raft, Paxos):
- 단순 reasoning.
- Bottleneck (leader 가 모든 write).

Leaderless (Dynamo):
- 매 write 가 임의 node.
- Conflict resolution 필요.
- 큰 throughput.

→ Trade-off.
```

### CockroachDB / Spanner
```
Range = 64 MB.
매 range 가 own Raft group.
1000 range = 1000 leader (parallel write).

→ Scale 의 비.
```

### Distributed lock (Raft 식)
```ts
// etcd
const lease = await client.lease.grant(10);
await client.kv.put('lock', 'value', { lease });

// 다른 client 가 wait
await client.watch.compactWatch('lock');
```

→ etcd 의 native support.

### Failure modes
```
- Network slow → timeout / retry.
- Network partition → split-brain (rare).
- Node crash → leader re-election.
- Disk full → write fail.
- Clock skew → consensus 어려움 (HLC 사용).
```

### Monitoring
```
- Leader changes (자주 = 문제).
- Log lag (follower 가 leader 보다 뒤).
- Quorum size (down node count).
- Apply latency.
```

### Gossip protocol (다른)
```
모든 node 가 random peer 에 정보.
- Cassandra / Consul / Riak 가 사용.
- 매 N round = exponential 전파.
- Eventually consistent.

→ Membership / failure detection.
```

→ Consensus 와 다름.

### Two-phase commit (2PC)
```
Coordinator + N participants.
Phase 1: prepare (lock + log).
Phase 2: commit / abort (모두 ack).

→ Cross-DB transaction.
"매 participant 가 OK 면 commit".

함정:
- Coordinator down 시 stuck.
- 매우 느림.
- 큰 system 가 안 사용.
```

→ Saga 가 modern alternative.

→ [[Backend_Saga_Choreography_vs_Orchestration]].

### Real-world
```
- etcd: K8s 의 brain.
- Consul: service mesh.
- ZooKeeper: 옛 (Kafka 의 older).
- TiKV / CockroachDB: distributed SQL.
- Apache BookKeeper: log.
- Kafka: 자체 KRaft (ZK 대체).
```

## 🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| Strong consistency | Raft (etcd, CockroachDB) |
| Eventually consistent | Dynamo / Cassandra |
| Distributed lock | etcd / Consul |
| Service discovery | Consul / etcd |
| BFT | Tendermint / blockchain |
| 작은 system | Single-node DB |
| Cross-DB transaction | Saga (NOT 2PC) |

## ❌ 안티패턴
- **Even node count (4)**: split-brain risk.
- **W = N**: 1 down = fail.
- **Wall clock 가정 distributed**: HLC 사용.
- **2PC 큰 system**: 대안 (saga).
- **Manual leader election**: 깨짐 자주.
- **No monitoring**: silent.

## 🤖 LLM 활용 힌트
- Raft 가 Paxos 의 modern (easier).
- Dynamo 식 = AP (eventual).
- R + W > N 가 strong consistency rule.
- Odd node count (3, 5, 7).

## 🔗 관련 문서
- [[CS_Distributed_Consensus]]
- [[CS_Eventual_Consistency]]
- [[CS_Vector_Clocks_Lamport]]