[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,448 @@
|
||||
---
|
||||
id: cs-distributed-consensus
|
||||
title: Distributed Consensus — Raft / Paxos / Leader Election
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [cs, distributed, consensus, vibe-coding]
|
||||
tech_stack: { language: "Concept", applicable_to: ["Backend"] }
|
||||
applied_in: []
|
||||
aliases: [Raft, Paxos, leader election, etcd, ZooKeeper, consensus, quorum]
|
||||
---
|
||||
|
||||
# Distributed Consensus
|
||||
|
||||
> N 노드 가 같은 결정 (leader, value). **Raft (modern, understandable), Paxos (classic), Zab (ZooKeeper)**. etcd / Consul / ZooKeeper 가 implementation. CAP theorem.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Consensus: 모든 노드 가 같은 value agree.
|
||||
- Quorum: majority (N/2 + 1).
|
||||
- Leader election.
|
||||
- Log replication.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### Why consensus
|
||||
```
|
||||
분산 system:
|
||||
- 어떤 노드 가 primary?
|
||||
- 어떤 value 가 latest?
|
||||
- Configuration 변경 동의?
|
||||
|
||||
→ Consensus protocol 가 답.
|
||||
```
|
||||
|
||||
### Raft (modern, recommended)
|
||||
```
|
||||
Roles:
|
||||
- Leader: write 받음
|
||||
- Follower: leader 따름
|
||||
- Candidate: leader 선출 중
|
||||
|
||||
Election:
|
||||
1. Follower 가 leader heartbeat 안 들음 → candidate
|
||||
2. Term++ + vote 자기 자신
|
||||
3. RequestVote RPC 다른 노드
|
||||
4. Majority vote → leader
|
||||
5. AppendEntries (heartbeat) 시작
|
||||
|
||||
Log replication:
|
||||
1. Client → leader
|
||||
2. Leader 가 log 추가
|
||||
3. AppendEntries → followers
|
||||
4. Majority ack → committed
|
||||
5. Leader 가 client respond + apply
|
||||
```
|
||||
|
||||
→ "Understandable" Paxos.
|
||||
|
||||
### Raft term
|
||||
```
|
||||
Term = monotonic counter.
|
||||
매 election 가 새 term.
|
||||
|
||||
Term 0: 시작
|
||||
Term 1: leader A
|
||||
Term 2: leader B (A 가 죽음)
|
||||
...
|
||||
```
|
||||
|
||||
### Quorum
|
||||
```
|
||||
N = 5 nodes.
|
||||
Majority = 3.
|
||||
|
||||
Write quorum: 3 nodes commit
|
||||
Read quorum: 1 (leader) 또는 모든 nodes (linearizable read)
|
||||
|
||||
→ Network partition 시 minority 가 work X.
|
||||
```
|
||||
|
||||
### CAP theorem
|
||||
```
|
||||
Consistency: 모든 노드 같은 value.
|
||||
Availability: 응답 OK.
|
||||
Partition tolerance: network partition 견딤.
|
||||
|
||||
→ Network partition 시 C 또는 A 둘 중.
|
||||
```
|
||||
|
||||
```
|
||||
CP: ZooKeeper, etcd, MongoDB (default).
|
||||
AP: Cassandra, DynamoDB.
|
||||
CA: 단일 노드 (no partition).
|
||||
```
|
||||
|
||||
### etcd (Raft, K8s 의 base)
|
||||
```bash
|
||||
# 3 node cluster
|
||||
etcd \
|
||||
--name node1 \
|
||||
--listen-peer-urls http://10.0.0.1:2380 \
|
||||
--listen-client-urls http://10.0.0.1:2379 \
|
||||
--initial-advertise-peer-urls http://10.0.0.1:2380 \
|
||||
--initial-cluster node1=http://10.0.0.1:2380,node2=http://10.0.0.2:2380,node3=http://10.0.0.3:2380 \
|
||||
--initial-cluster-state new
|
||||
```
|
||||
|
||||
```ts
|
||||
import { Etcd3 } from 'etcd3';
|
||||
|
||||
const client = new Etcd3({ hosts: ['10.0.0.1:2379', '10.0.0.2:2379', '10.0.0.3:2379'] });
|
||||
|
||||
// Put
|
||||
await client.put('/config/feature-x').value('enabled');
|
||||
|
||||
// Get
|
||||
const value = await client.get('/config/feature-x').string();
|
||||
|
||||
// Watch
|
||||
client.watch().key('/config/feature-x').create().then(watcher => {
|
||||
watcher.on('put', (v) => console.log('Changed:', v.value.toString()));
|
||||
});
|
||||
|
||||
// Lease (TTL)
|
||||
const lease = await client.lease(60); // 60s
|
||||
await lease.put('/services/my-app/instance-1').value('healthy');
|
||||
// Auto delete after 60s without keepalive
|
||||
```
|
||||
|
||||
→ K8s 의 cluster state. Service discovery.
|
||||
|
||||
### Consul
|
||||
```ts
|
||||
import Consul from 'consul';
|
||||
|
||||
const consul = new Consul();
|
||||
|
||||
// KV
|
||||
await consul.kv.set('config/feature-x', 'enabled');
|
||||
const value = await consul.kv.get('config/feature-x');
|
||||
|
||||
// Service registration
|
||||
await consul.agent.service.register({
|
||||
name: 'my-app',
|
||||
id: 'my-app-1',
|
||||
address: '10.0.0.1',
|
||||
port: 3000,
|
||||
check: {
|
||||
http: 'http://10.0.0.1:3000/health',
|
||||
interval: '10s',
|
||||
},
|
||||
});
|
||||
|
||||
// Find service
|
||||
const services = await consul.health.service('my-app');
|
||||
```
|
||||
|
||||
→ Service discovery + KV. Multi-DC.
|
||||
|
||||
### ZooKeeper (Zab)
|
||||
```bash
|
||||
# 3 node ZK ensemble.
|
||||
# Java 기반 (older).
|
||||
|
||||
zkCli.sh
|
||||
> create /myapp/config "value"
|
||||
> get /myapp/config
|
||||
> ls /myapp
|
||||
```
|
||||
|
||||
→ Kafka, HBase, Hadoop 의 cluster coord.
|
||||
|
||||
### Leader election (Raft / etcd)
|
||||
```ts
|
||||
import { Etcd3 } from 'etcd3';
|
||||
|
||||
const client = new Etcd3();
|
||||
const election = client.election('my-leader');
|
||||
const campaign = election.campaign('node-1');
|
||||
|
||||
campaign.on('elected', () => {
|
||||
console.log('I am leader');
|
||||
startLeaderWork();
|
||||
});
|
||||
|
||||
campaign.on('error', (err) => {
|
||||
console.error(err);
|
||||
});
|
||||
```
|
||||
|
||||
→ 한 노드 만 leader. 나머지 follower.
|
||||
|
||||
### Use case — 분산 cron
|
||||
```
|
||||
N 노드 의 cron job — 한 번만 실행:
|
||||
|
||||
1. Leader election
|
||||
2. Leader 만 cron schedule
|
||||
3. Leader 가 죽으면 → election
|
||||
|
||||
→ ZooKeeper / etcd / Redis lock.
|
||||
```
|
||||
|
||||
```ts
|
||||
async function tryBecomeLeader(): Promise<boolean> {
|
||||
return await election.campaign('cron-leader').then(() => true);
|
||||
}
|
||||
|
||||
if (await tryBecomeLeader()) {
|
||||
scheduleCron();
|
||||
}
|
||||
```
|
||||
|
||||
### Distributed lock (etcd / Redis)
|
||||
```ts
|
||||
// etcd 의 lock primitives
|
||||
const lock = client.lock('my-resource');
|
||||
await lock.acquire();
|
||||
try {
|
||||
await doWork();
|
||||
} finally {
|
||||
await lock.release();
|
||||
}
|
||||
```
|
||||
|
||||
```ts
|
||||
// Redis (Redlock)
|
||||
import Redlock from 'redlock';
|
||||
|
||||
const redlock = new Redlock([redisA, redisB, redisC]);
|
||||
const resource = await redlock.acquire(['locks:my-resource'], 30_000);
|
||||
try {
|
||||
await doWork();
|
||||
} finally {
|
||||
await resource.release();
|
||||
}
|
||||
```
|
||||
|
||||
→ [[DB_Distributed_Locks]].
|
||||
|
||||
### Linearizability vs eventual
|
||||
```
|
||||
Linearizable: 외부 관찰 = 단일 노드 처럼.
|
||||
- etcd, ZooKeeper
|
||||
- Spanner
|
||||
|
||||
Eventual: 결국 같음.
|
||||
- Cassandra
|
||||
- DynamoDB
|
||||
|
||||
→ Trade-off. CP vs AP.
|
||||
```
|
||||
|
||||
### Two Generals / Byzantine
|
||||
```
|
||||
Two Generals: network 가 잃기 — agreement 어려움.
|
||||
Byzantine: nodes 가 거짓 — 더 어려움.
|
||||
|
||||
Solutions:
|
||||
- Raft / Paxos: 정직 노드 가정.
|
||||
- BFT (Byzantine Fault Tolerance): adversarial 노드 — Bitcoin / Ethereum.
|
||||
- HotStuff, Tendermint: modern BFT.
|
||||
```
|
||||
|
||||
### Bitcoin consensus (PoW)
|
||||
```
|
||||
Bitcoin = Byzantine consensus:
|
||||
- 1 person = 1 hash (proof of work).
|
||||
- Longest chain wins.
|
||||
- Probabilistic finality (6 confirmation).
|
||||
|
||||
Energy 비싸 — Ethereum 가 PoS 로 이동.
|
||||
```
|
||||
|
||||
### Etcd vs Consul vs ZooKeeper
|
||||
```
|
||||
etcd:
|
||||
+ K8s native
|
||||
+ HTTP / gRPC
|
||||
+ Modern
|
||||
- 작은 (single purpose)
|
||||
|
||||
Consul:
|
||||
+ Service discovery 강
|
||||
+ Multi-DC
|
||||
+ Health check
|
||||
- 더 큰 dependency
|
||||
|
||||
ZooKeeper:
|
||||
+ Mature (Hadoop / Kafka)
|
||||
+ 매우 안정
|
||||
- Java
|
||||
- Less modern API
|
||||
```
|
||||
|
||||
### Cluster size
|
||||
```
|
||||
N = 2: 작동 X (no majority).
|
||||
N = 3: 1 fail OK.
|
||||
N = 5: 2 fail OK (큰 cluster 권장).
|
||||
N = 7: 3 fail OK.
|
||||
|
||||
Even N (2, 4, 6) X — 같은 fault tolerance + 더 큰 quorum.
|
||||
|
||||
→ 보통 3 또는 5.
|
||||
```
|
||||
|
||||
### Multi-region (cross-DC)
|
||||
```
|
||||
ZooKeeper / etcd 가 latency 민감 (consensus 매 write).
|
||||
Cross-region = 100ms+ — write 매우 느림.
|
||||
|
||||
해결:
|
||||
- 단일 region quorum
|
||||
- 다른 region = read replica (eventually consistent)
|
||||
```
|
||||
|
||||
### Operation
|
||||
```
|
||||
- Backup (regular snapshot)
|
||||
- Disaster recovery (config restore)
|
||||
- Monitoring (leader change, lag)
|
||||
- Upgrade (rolling restart)
|
||||
- Compaction (옛 log 정리)
|
||||
```
|
||||
|
||||
### Failure scenarios
|
||||
```
|
||||
1. Leader 죽음 → election (5-10s)
|
||||
2. Network partition → minority 가 work X
|
||||
3. All majority 죽음 → cluster down
|
||||
4. Disk full → write fail
|
||||
5. Clock skew → election issue
|
||||
```
|
||||
|
||||
### Real-world apps
|
||||
```
|
||||
K8s: etcd
|
||||
Consul: service mesh / discovery
|
||||
ZK: Kafka, Hadoop, HBase
|
||||
Apache Kafka: 자체 Raft (KRaft, 2024+)
|
||||
CockroachDB: 자체 Raft
|
||||
TiDB: PD (자체 Raft)
|
||||
```
|
||||
|
||||
### Implementing Raft (학습)
|
||||
```
|
||||
Raft paper: https://raft.github.io
|
||||
Visualization: https://thesecretlivesofdata.com/raft/
|
||||
|
||||
자체 implement = 학습 (production 에 안 쓰지 X).
|
||||
hashicorp/raft (Go), MIT 6.824 lab.
|
||||
```
|
||||
|
||||
### When NOT to use
|
||||
```
|
||||
- Single node 충분 (작은 app)
|
||||
- Stateless service (no consensus 필요)
|
||||
- 단순 leader 만 — Redis lock 충분
|
||||
- Strong consistency 안 필요 — eventual OK
|
||||
```
|
||||
|
||||
### Saga (consensus 가 아닌 alternative)
|
||||
```
|
||||
Distributed transaction:
|
||||
- 2PC: blocking, slow
|
||||
- Saga: compensating, fast
|
||||
|
||||
→ [[Backend_Saga_Patterns]].
|
||||
```
|
||||
|
||||
### Modern: KRaft (Kafka)
|
||||
```
|
||||
Kafka 가 ZooKeeper 의존 → KRaft (자체 Raft, 2024).
|
||||
Single binary. 더 단순 ops.
|
||||
```
|
||||
|
||||
### Time
|
||||
```
|
||||
Leader election: 5-10s (default Raft).
|
||||
Write commit: 1-10ms (single DC).
|
||||
Cross-DC: 100ms+.
|
||||
|
||||
→ 빠른 = 같은 DC.
|
||||
```
|
||||
|
||||
### Use cases
|
||||
```
|
||||
✅ Service discovery
|
||||
✅ Configuration store
|
||||
✅ Leader election (distributed cron)
|
||||
✅ Distributed lock
|
||||
✅ Coordination (cluster size)
|
||||
✅ K8s state
|
||||
|
||||
❌ High-throughput data (Cassandra)
|
||||
❌ Big files (S3)
|
||||
❌ Cache (Redis)
|
||||
```
|
||||
|
||||
### Failure tolerance
|
||||
```
|
||||
3 node etcd: 1 failure OK.
|
||||
실제 3 fail = data loss 위험.
|
||||
|
||||
→ 3+ node 권장. 5 가 stable.
|
||||
```
|
||||
|
||||
### Learning resources
|
||||
```
|
||||
- Raft paper (raft.github.io)
|
||||
- "The Secret Lives of Data" (visual)
|
||||
- Designing Data-Intensive Applications (book)
|
||||
- Distributed Systems by Tanenbaum
|
||||
- etcd / Consul docs
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 작업 | 추천 |
|
||||
|---|---|
|
||||
| K8s | etcd (built-in) |
|
||||
| Service discovery | Consul |
|
||||
| Java ecosystem | ZooKeeper |
|
||||
| Distributed lock | etcd / Redis Redlock |
|
||||
| Cluster state | etcd / Consul |
|
||||
| 작은 + 단순 | Redis lock |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **2 node consensus**: no majority.
|
||||
- **Even N**: same fault tolerance + 더 큰 quorum.
|
||||
- **Cross-region single quorum**: write 매우 느림.
|
||||
- **Disk full 무 monitoring**: leader stuck.
|
||||
- **Backup 무**: snapshot lost = cluster lost.
|
||||
- **모든 거 etcd**: high-throughput 안 적합.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- 3 또는 5 node.
|
||||
- Raft 가 modern.
|
||||
- etcd / Consul = standard.
|
||||
- Cross-region = 단일 region quorum + read replica.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[CS_Eventual_Consistency]]
|
||||
- [[DB_Distributed_Locks]]
|
||||
- [[Backend_Service_Discovery]]
|
||||
Reference in New Issue
Block a user