Files
2nd/10_Wiki/Topics/Architecture/Append-only log.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

177 lines
6.2 KiB
Markdown

---
id: wiki-2026-0508-append-only-log
title: Append-only Log
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Commit log, WAL, Event log, Immutable log]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [log, kafka, event-sourcing, wal, storage]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: java
framework: Kafka, Pulsar, Postgres WAL
---
# Append-only Log
## 매 한 줄
> **"매 sequence of immutable events — write once, read many"**. 매 database WAL (1980s) → distributed (LinkedIn Kafka 2011) → event sourcing 의 backbone. 매 2026 modern stack 은 Kafka 3.7 (KRaft, no ZK) / Redpanda (Raft, C++) / Pulsar 3.x (BookKeeper) / Postgres logical replication / WarpStream (S3-backed Kafka).
## 매 핵심
### 매 properties
- **Append-only**: 매 mutation 의 forbid. 매 corrections via new compensating event.
- **Ordered**: monotonic offset/sequence per partition.
- **Durable**: fsync, replicated (typical RF=3, ack=all).
- **Replayable**: consumers re-read from any offset.
- **Retention**: time-based, size-based, or compaction (key-based latest).
### 매 use cases
- **Database WAL** — Postgres pg_wal, MySQL binlog. Crash recovery.
- **Event sourcing** — domain events as source of truth, projections rebuild state.
- **CDC** — Debezium reads DB log → Kafka → consumers.
- **Stream processing** — Flink/Kafka Streams stateful aggregations.
- **Audit log** — tamper-evident with hash chain.
### 매 응용
1. **Kafka topic** — 7-day retention, multi-consumer fan-out.
2. **Event-sourced aggregate** — order state from order_events.
3. **Outbox pattern** — DB transaction + log entry → reliable event publish.
4. **Time-travel debugging** — replay from offset N.
## 💻 패턴
### Kafka producer (idempotent + transactional)
```java
Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("enable.idempotence", "true");
p.put("acks", "all");
p.put("transactional.id", "orders-producer-1");
KafkaProducer<String,String> prod = new KafkaProducer<>(p, new StringSer(), new StringSer());
prod.initTransactions();
prod.beginTransaction();
prod.send(new ProducerRecord<>("orders", orderId, json));
prod.send(new ProducerRecord<>("audit", orderId, audit));
prod.commitTransaction();
```
### Consumer (offset commit after process)
```java
KafkaConsumer<String,String> c = new KafkaConsumer<>(props);
c.subscribe(List.of("orders"));
while (true) {
ConsumerRecords<String,String> recs = c.poll(Duration.ofSeconds(1));
for (var r : recs) processOrder(r.value());
c.commitSync(); // at-least-once
}
```
### Event sourcing aggregate
```typescript
type OrderEvent =
| { type: "Created", id: string, items: Item[] }
| { type: "Paid", amount: number }
| { type: "Shipped", trackingId: string };
function applyEvent(state: Order, e: OrderEvent): Order {
switch (e.type) {
case "Created": return { ...state, id: e.id, items: e.items, status: "pending" };
case "Paid": return { ...state, status: "paid", paidAmount: e.amount };
case "Shipped": return { ...state, status: "shipped", tracking: e.trackingId };
}
}
const state = events.reduce(applyEvent, {} as Order);
```
### Outbox pattern (Postgres + Debezium)
```sql
BEGIN;
INSERT INTO orders(id, status) VALUES ('abc', 'pending');
INSERT INTO outbox(aggregate_id, event_type, payload)
VALUES ('abc', 'OrderCreated', '{"id":"abc",...}'::jsonb);
COMMIT;
-- Debezium tails pg_wal → publishes outbox row → Kafka 'orders' topic
```
### Log compaction (Kafka)
```bash
# Topic config: cleanup.policy=compact
# Same key keeps only latest value → materialize current state
kafka-configs.sh --alter --entity-type topics --entity-name user-profiles \
--add-config cleanup.policy=compact,min.cleanable.dirty.ratio=0.1
```
### Hash-chained audit log
```python
import hashlib, json
def append(prev_hash: str, event: dict) -> tuple[str, dict]:
record = {"prev": prev_hash, "event": event, "ts": time.time()}
h = hashlib.sha256(json.dumps(record, sort_keys=True).encode()).hexdigest()
return h, {**record, "hash": h}
# Tamper-evident: any modification breaks chain
```
### Postgres logical replication slot
```sql
SELECT pg_create_logical_replication_slot('app_slot', 'pgoutput');
-- Stream WAL changes to consumer (CDC)
SELECT * FROM pg_logical_slot_get_changes('app_slot', NULL, NULL);
```
### Snapshot + tail (event sourcing optimization)
```typescript
async function loadAggregate(id: string): Promise<Order> {
const snap = await snapStore.get(id); // periodic snapshot
const events = await eventStore.read(id, snap?.version ?? 0);
return events.reduce(applyEvent, snap?.state ?? {});
}
```
## 매 결정 기준
| 상황 | System |
|---|---|
| High-throughput streaming, multi-consumer | Kafka / Redpanda |
| Geo-replicated, tiered storage | Pulsar / WarpStream |
| Event-sourced single service | EventStoreDB / Postgres + outbox |
| Database CDC | Debezium → Kafka |
| Tamper-evident audit | Hash-chain + signed |
**기본값**: 매 Kafka (or Redpanda for ops simplicity) 매 distributed log, 매 Postgres WAL + outbox 매 single-service.
## 🔗 Graph
- 부모: [[Distributed Systems]]
- 변형: [[Kafka]] · [[WAL]] · [[Event Store]]
- 응용: [[Event Sourcing]] · [[CDC]] · [[CQRS]]
- Adjacent: [[Stream-Processing-Architectures|Stream Processing]] · [[Idempotency]]
## 🤖 LLM 활용
**언제**: 매 audit/replay 요구, 매 multiple consumer/projection, 매 temporal queries, 매 reliable event publishing.
**언제 X**: 매 simple CRUD without history, 매 strong consistency snapshot only, 매 storage cost-sensitive (logs grow).
## ❌ 안티패턴
- **Mutating past events**: 매 invariant violation. 매 compensating event 의 emit.
- **Unbounded retention without compaction**: 매 storage explosion.
- **Synchronous replay on every read**: 매 latency. 매 snapshot + tail.
- **Single-partition Kafka topic**: 매 throughput cap. 매 partition by key.
## 🧪 검증 / 중복
- Verified (Jay Kreps "The Log" 2013, Kafka docs, Postgres WAL docs, Greg Young event sourcing).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (Kafka, event sourcing, WAL, outbox) |