--- id: wiki-2026-0508-append-only-log title: Append-only Log category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Commit log, WAL, Event log, Immutable log] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [log, kafka, event-sourcing, wal, storage] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: java framework: Kafka, Pulsar, Postgres WAL --- # Append-only Log ## 매 한 줄 > **"매 sequence of immutable events — write once, read many"**. 매 database WAL (1980s) → distributed (LinkedIn Kafka 2011) → event sourcing 의 backbone. 매 2026 modern stack 은 Kafka 3.7 (KRaft, no ZK) / Redpanda (Raft, C++) / Pulsar 3.x (BookKeeper) / Postgres logical replication / WarpStream (S3-backed Kafka). ## 매 핵심 ### 매 properties - **Append-only**: 매 mutation 의 forbid. 매 corrections via new compensating event. - **Ordered**: monotonic offset/sequence per partition. - **Durable**: fsync, replicated (typical RF=3, ack=all). - **Replayable**: consumers re-read from any offset. - **Retention**: time-based, size-based, or compaction (key-based latest). ### 매 use cases - **Database WAL** — Postgres pg_wal, MySQL binlog. Crash recovery. - **Event sourcing** — domain events as source of truth, projections rebuild state. - **CDC** — Debezium reads DB log → Kafka → consumers. - **Stream processing** — Flink/Kafka Streams stateful aggregations. - **Audit log** — tamper-evident with hash chain. ### 매 응용 1. **Kafka topic** — 7-day retention, multi-consumer fan-out. 2. **Event-sourced aggregate** — order state from order_events. 3. **Outbox pattern** — DB transaction + log entry → reliable event publish. 4. **Time-travel debugging** — replay from offset N. ## 💻 패턴 ### Kafka producer (idempotent + transactional) ```java Properties p = new Properties(); p.put("bootstrap.servers", "broker:9092"); p.put("enable.idempotence", "true"); p.put("acks", "all"); p.put("transactional.id", "orders-producer-1"); KafkaProducer prod = new KafkaProducer<>(p, new StringSer(), new StringSer()); prod.initTransactions(); prod.beginTransaction(); prod.send(new ProducerRecord<>("orders", orderId, json)); prod.send(new ProducerRecord<>("audit", orderId, audit)); prod.commitTransaction(); ``` ### Consumer (offset commit after process) ```java KafkaConsumer c = new KafkaConsumer<>(props); c.subscribe(List.of("orders")); while (true) { ConsumerRecords recs = c.poll(Duration.ofSeconds(1)); for (var r : recs) processOrder(r.value()); c.commitSync(); // at-least-once } ``` ### Event sourcing aggregate ```typescript type OrderEvent = | { type: "Created", id: string, items: Item[] } | { type: "Paid", amount: number } | { type: "Shipped", trackingId: string }; function applyEvent(state: Order, e: OrderEvent): Order { switch (e.type) { case "Created": return { ...state, id: e.id, items: e.items, status: "pending" }; case "Paid": return { ...state, status: "paid", paidAmount: e.amount }; case "Shipped": return { ...state, status: "shipped", tracking: e.trackingId }; } } const state = events.reduce(applyEvent, {} as Order); ``` ### Outbox pattern (Postgres + Debezium) ```sql BEGIN; INSERT INTO orders(id, status) VALUES ('abc', 'pending'); INSERT INTO outbox(aggregate_id, event_type, payload) VALUES ('abc', 'OrderCreated', '{"id":"abc",...}'::jsonb); COMMIT; -- Debezium tails pg_wal → publishes outbox row → Kafka 'orders' topic ``` ### Log compaction (Kafka) ```bash # Topic config: cleanup.policy=compact # Same key keeps only latest value → materialize current state kafka-configs.sh --alter --entity-type topics --entity-name user-profiles \ --add-config cleanup.policy=compact,min.cleanable.dirty.ratio=0.1 ``` ### Hash-chained audit log ```python import hashlib, json def append(prev_hash: str, event: dict) -> tuple[str, dict]: record = {"prev": prev_hash, "event": event, "ts": time.time()} h = hashlib.sha256(json.dumps(record, sort_keys=True).encode()).hexdigest() return h, {**record, "hash": h} # Tamper-evident: any modification breaks chain ``` ### Postgres logical replication slot ```sql SELECT pg_create_logical_replication_slot('app_slot', 'pgoutput'); -- Stream WAL changes to consumer (CDC) SELECT * FROM pg_logical_slot_get_changes('app_slot', NULL, NULL); ``` ### Snapshot + tail (event sourcing optimization) ```typescript async function loadAggregate(id: string): Promise { const snap = await snapStore.get(id); // periodic snapshot const events = await eventStore.read(id, snap?.version ?? 0); return events.reduce(applyEvent, snap?.state ?? {}); } ``` ## 매 결정 기준 | 상황 | System | |---|---| | High-throughput streaming, multi-consumer | Kafka / Redpanda | | Geo-replicated, tiered storage | Pulsar / WarpStream | | Event-sourced single service | EventStoreDB / Postgres + outbox | | Database CDC | Debezium → Kafka | | Tamper-evident audit | Hash-chain + signed | **기본값**: 매 Kafka (or Redpanda for ops simplicity) 매 distributed log, 매 Postgres WAL + outbox 매 single-service. ## 🔗 Graph - 부모: [[Distributed Systems]] - 변형: [[Kafka]] · [[WAL]] · [[Event Store]] - 응용: [[Event Sourcing]] · [[CDC]] · [[CQRS]] - Adjacent: [[Stream-Processing-Architectures|Stream Processing]] · [[Idempotency]] ## 🤖 LLM 활용 **언제**: 매 audit/replay 요구, 매 multiple consumer/projection, 매 temporal queries, 매 reliable event publishing. **언제 X**: 매 simple CRUD without history, 매 strong consistency snapshot only, 매 storage cost-sensitive (logs grow). ## ❌ 안티패턴 - **Mutating past events**: 매 invariant violation. 매 compensating event 의 emit. - **Unbounded retention without compaction**: 매 storage explosion. - **Synchronous replay on every read**: 매 latency. 매 snapshot + tail. - **Single-partition Kafka topic**: 매 throughput cap. 매 partition by key. ## 🧪 검증 / 중복 - Verified (Jay Kreps "The Log" 2013, Kafka docs, Postgres WAL docs, Greg Young event sourcing). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full content (Kafka, event sourcing, WAL, outbox) |