Files
2nd/10_Wiki/Topics/Coding/CS_Quorum_Consensus.md
T
2026-05-10 22:08:15 +09:00

7.6 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
cs-quorum-consensus Quorum / Consensus — Paxos / Raft / Dynamo style Coding draft B conceptual 2026-05-09 2026-05-09
cs
consensus
distributed
vibe-coding
language applicable_to
TS / Go
Backend
CS
quorum
consensus
Paxos
Raft
R+W>N
Dynamo
distributed agreement

Quorum / Consensus

분산 system 의 핵심 algorithm. Paxos (foundational), Raft (modern), Dynamo (eventually consistent). Quorum = N/2+1 동의.

📖 핵심 개념

  • N: total replica 수.
  • W: write quorum (W replica 가 commit).
  • R: read quorum.
  • R + W > N → strong consistency.

💻 코드 패턴

Quorum 식 (R + W > N)

N = 5 (5 replica).

Strong:
- W = 3, R = 3 → R + W = 6 > 5. 강 consistency.
- 2 down OK (3 가 quorum).

Eventual:
- W = 1, R = 1 → 매 read 가 이전 write 의 update 못 봄.
- 가장 빠름.

균형:
- W = 3, R = 1 (read-heavy).
- W = 1, R = 3 (write-heavy).

Dynamo (Cassandra 식)

// Coordinator 가 5 replica 에 write 보냄
// W=3 ack 가져오면 success.

async function write(key, value) {
  const responses = await Promise.allSettled(
    replicas.map(r => r.write(key, value))
  );
  
  const success = responses.filter(r => r.status === 'fulfilled').length;
  if (success < W) throw new Error('write failed');
  
  // 다른 replica 도 background 로 catch up.
}

Read repair

async function read(key) {
  const responses = await Promise.allSettled(
    replicas.slice(0, R).map(r => r.read(key))
  );
  
  // 매 replica 의 version 비교
  const values = responses.map(r => r.value);
  const latest = mostRecent(values);   // by vector clock / timestamp
  
  // 옛 replica 도 update (read repair)
  for (const [i, v] of values.entries()) {
    if (v !== latest) replicas[i].write(key, latest, { background: true });
  }
  
  return latest;
}

Hinted handoff

W=3 의 write.
1 replica 가 down.
Coordinator 가 다른 node 가 임시 store ("hint").
Down replica 가 up = hint 가 transfer.

→ Availability ↑.

Paxos (foundational)

Phase 1 (Prepare):
- Proposer 가 number N 으로 prepare.
- Acceptor 가 N > 자기 가장 큰 = OK.

Phase 2 (Accept):
- Proposer 가 value 보냄.
- Acceptor 가 majority OK = accepted.

Phase 3 (Learn):
- Learner 가 value 알아.

→ 매 round 가 1 value. 복잡.

→ Multi-Paxos 가 series.

Raft (modern, simple)

3 role:
- Leader: write 받음, follower 에 replicate.
- Follower: leader 의 entry append.
- Candidate: leader election.

매 N follower 가 ack = committed (W = N/2 + 1).

Raft election

Leader 가 heartbeat 안 보내면:
1. Follower 가 timeout (random 150-300 ms).
2. Candidate 가 됨.
3. RequestVote 가 다른 node.
4. Majority 가 vote = leader.

Term: 매 election 의 number.

Raft log replication

Client → Leader → AppendEntries(log) → Followers.
Followers ack → Leader 가 commit.
Leader → 다음 heartbeat 가 commit index.
Followers 가 apply.

→ 모든 node 가 같은 sequence.

Raft 의 implementation

- etcd (CoreOS / Kubernetes)
- Consul (HashiCorp)
- TiKV / CockroachDB / YugabyteDB
- RAFTKE (Rust)
- nuraft (C++)

When use?

- Distributed lock (etcd)
- Service discovery (Consul)
- Distributed DB (CockroachDB, TiDB)
- Configuration store (ZooKeeper, etcd)

vs Paxos

Paxos: 가장 first, complex.
Raft: equivalent, easier to understand.

→ Modern = Raft.

Byzantine fault tolerance (BFT)

정상 fault: node 가 crash.
Byzantine: node 가 lie (악성).

→ Paxos / Raft 가 안 다룸 (crash-only).
PBFT, Tendermint 가 BFT.
Blockchain 가 BFT (보통).

CAP theorem

Consistency vs Availability vs Partition tolerance.
2 만 (3 다 안 됨).

CP: Consistency + Partition (예: HBase, MongoDB).
AP: Availability + Partition (예: Cassandra, DynamoDB).
CA: 안 됨 (network 가 partition 됨).

Network partition 시

CP: minority partition 가 reject (read/write 안 됨).
AP: 양쪽 partition 가 read/write OK. 이후 reconcile (CRDT 등).

→ Real world = AP / CP 의 mix.

Strong vs eventual

Strong: 매 read 가 이전 write 봄.
Linearizable: 시간 순서 보존.
Sequential: 매 process 의 순서 보존.
Causal: causality 보존.
Eventual: 결국 같음 (no time bound).

Read-your-write

사용자 가 자기 write 후 immediate read = visible.

구현:
- Sticky session (같은 replica).
- Write ack 후 cache.
- Eventually consistent + 사용자 별 latest 추적.

Quorum 의 함정

- W = N (all replica): 1 down = write fail. Brittle.
- W = 1: read 가 stale.
- W = R = 1, N = 3: 가장 fast, weakest.
- W = 3, R = 3, N = 5: 강 + 2 down OK.

Network partition 의 실제

Split-brain: 두 partition 가 각자 leader.
- Raft 가 막음 (term + majority).
- Manual recovery 가 필요할 때 있음.

→ Consul / etcd 의 production tip = 5-7 node, odd count.

Single leader vs leaderless

Single leader (Raft, Paxos):
- 단순 reasoning.
- Bottleneck (leader 가 모든 write).

Leaderless (Dynamo):
- 매 write 가 임의 node.
- Conflict resolution 필요.
- 큰 throughput.

→ Trade-off.

CockroachDB / Spanner

Range = 64 MB.
매 range 가 own Raft group.
1000 range = 1000 leader (parallel write).

→ Scale 의 비.

Distributed lock (Raft 식)

// etcd
const lease = await client.lease.grant(10);
await client.kv.put('lock', 'value', { lease });

// 다른 client 가 wait
await client.watch.compactWatch('lock');

→ etcd 의 native support.

Failure modes

- Network slow → timeout / retry.
- Network partition → split-brain (rare).
- Node crash → leader re-election.
- Disk full → write fail.
- Clock skew → consensus 어려움 (HLC 사용).

Monitoring

- Leader changes (자주 = 문제).
- Log lag (follower 가 leader 보다 뒤).
- Quorum size (down node count).
- Apply latency.

Gossip protocol (다른)

모든 node 가 random peer 에 정보.
- Cassandra / Consul / Riak 가 사용.
- 매 N round = exponential 전파.
- Eventually consistent.

→ Membership / failure detection.

→ Consensus 와 다름.

Two-phase commit (2PC)

Coordinator + N participants.
Phase 1: prepare (lock + log).
Phase 2: commit / abort (모두 ack).

→ Cross-DB transaction.
"매 participant 가 OK 면 commit".

함정:
- Coordinator down 시 stuck.
- 매우 느림.
- 큰 system 가 안 사용.

→ Saga 가 modern alternative.

Backend_Saga_Choreography_vs_Orchestration.

Real-world

- etcd: K8s 의 brain.
- Consul: service mesh.
- ZooKeeper: 옛 (Kafka 의 older).
- TiKV / CockroachDB: distributed SQL.
- Apache BookKeeper: log.
- Kafka: 자체 KRaft (ZK 대체).

🤔 의사결정 기준

상황 추천
Strong consistency Raft (etcd, CockroachDB)
Eventually consistent Dynamo / Cassandra
Distributed lock etcd / Consul
Service discovery Consul / etcd
BFT Tendermint / blockchain
작은 system Single-node DB
Cross-DB transaction Saga (NOT 2PC)

안티패턴

  • Even node count (4): split-brain risk.
  • W = N: 1 down = fail.
  • Wall clock 가정 distributed: HLC 사용.
  • 2PC 큰 system: 대안 (saga).
  • Manual leader election: 깨짐 자주.
  • No monitoring: silent.

🤖 LLM 활용 힌트

  • Raft 가 Paxos 의 modern (easier).
  • Dynamo 식 = AP (eventual).
  • R + W > N 가 strong consistency rule.
  • Odd node count (3, 5, 7).

🔗 관련 문서