Files
2nd/10_Wiki/Topics/Architecture/Append-only log.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.2 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-append-only-log Append-only Log 10_Wiki/Topics verified self
Commit log
WAL
Event log
Immutable log
none A 0.9 applied
log
kafka
event-sourcing
wal
storage
2026-05-10 pending
language framework
java Kafka, Pulsar, Postgres WAL

Append-only Log

매 한 줄

"매 sequence of immutable events — write once, read many". 매 database WAL (1980s) → distributed (LinkedIn Kafka 2011) → event sourcing 의 backbone. 매 2026 modern stack 은 Kafka 3.7 (KRaft, no ZK) / Redpanda (Raft, C++) / Pulsar 3.x (BookKeeper) / Postgres logical replication / WarpStream (S3-backed Kafka).

매 핵심

매 properties

  • Append-only: 매 mutation 의 forbid. 매 corrections via new compensating event.
  • Ordered: monotonic offset/sequence per partition.
  • Durable: fsync, replicated (typical RF=3, ack=all).
  • Replayable: consumers re-read from any offset.
  • Retention: time-based, size-based, or compaction (key-based latest).

매 use cases

  • Database WAL — Postgres pg_wal, MySQL binlog. Crash recovery.
  • Event sourcing — domain events as source of truth, projections rebuild state.
  • CDC — Debezium reads DB log → Kafka → consumers.
  • Stream processing — Flink/Kafka Streams stateful aggregations.
  • Audit log — tamper-evident with hash chain.

매 응용

  1. Kafka topic — 7-day retention, multi-consumer fan-out.
  2. Event-sourced aggregate — order state from order_events.
  3. Outbox pattern — DB transaction + log entry → reliable event publish.
  4. Time-travel debugging — replay from offset N.

💻 패턴

Kafka producer (idempotent + transactional)

Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("enable.idempotence", "true");
p.put("acks", "all");
p.put("transactional.id", "orders-producer-1");
KafkaProducer<String,String> prod = new KafkaProducer<>(p, new StringSer(), new StringSer());
prod.initTransactions();

prod.beginTransaction();
prod.send(new ProducerRecord<>("orders", orderId, json));
prod.send(new ProducerRecord<>("audit", orderId, audit));
prod.commitTransaction();

Consumer (offset commit after process)

KafkaConsumer<String,String> c = new KafkaConsumer<>(props);
c.subscribe(List.of("orders"));
while (true) {
  ConsumerRecords<String,String> recs = c.poll(Duration.ofSeconds(1));
  for (var r : recs) processOrder(r.value());
  c.commitSync();  // at-least-once
}

Event sourcing aggregate

type OrderEvent =
  | { type: "Created", id: string, items: Item[] }
  | { type: "Paid",    amount: number }
  | { type: "Shipped", trackingId: string };

function applyEvent(state: Order, e: OrderEvent): Order {
  switch (e.type) {
    case "Created": return { ...state, id: e.id, items: e.items, status: "pending" };
    case "Paid":    return { ...state, status: "paid", paidAmount: e.amount };
    case "Shipped": return { ...state, status: "shipped", tracking: e.trackingId };
  }
}

const state = events.reduce(applyEvent, {} as Order);

Outbox pattern (Postgres + Debezium)

BEGIN;
INSERT INTO orders(id, status) VALUES ('abc', 'pending');
INSERT INTO outbox(aggregate_id, event_type, payload)
  VALUES ('abc', 'OrderCreated', '{"id":"abc",...}'::jsonb);
COMMIT;
-- Debezium tails pg_wal → publishes outbox row → Kafka 'orders' topic

Log compaction (Kafka)

# Topic config: cleanup.policy=compact
# Same key keeps only latest value → materialize current state
kafka-configs.sh --alter --entity-type topics --entity-name user-profiles \
  --add-config cleanup.policy=compact,min.cleanable.dirty.ratio=0.1

Hash-chained audit log

import hashlib, json

def append(prev_hash: str, event: dict) -> tuple[str, dict]:
    record = {"prev": prev_hash, "event": event, "ts": time.time()}
    h = hashlib.sha256(json.dumps(record, sort_keys=True).encode()).hexdigest()
    return h, {**record, "hash": h}

# Tamper-evident: any modification breaks chain

Postgres logical replication slot

SELECT pg_create_logical_replication_slot('app_slot', 'pgoutput');
-- Stream WAL changes to consumer (CDC)
SELECT * FROM pg_logical_slot_get_changes('app_slot', NULL, NULL);

Snapshot + tail (event sourcing optimization)

async function loadAggregate(id: string): Promise<Order> {
  const snap = await snapStore.get(id);  // periodic snapshot
  const events = await eventStore.read(id, snap?.version ?? 0);
  return events.reduce(applyEvent, snap?.state ?? {});
}

매 결정 기준

상황 System
High-throughput streaming, multi-consumer Kafka / Redpanda
Geo-replicated, tiered storage Pulsar / WarpStream
Event-sourced single service EventStoreDB / Postgres + outbox
Database CDC Debezium → Kafka
Tamper-evident audit Hash-chain + signed

기본값: 매 Kafka (or Redpanda for ops simplicity) 매 distributed log, 매 Postgres WAL + outbox 매 single-service.

🔗 Graph

🤖 LLM 활용

언제: 매 audit/replay 요구, 매 multiple consumer/projection, 매 temporal queries, 매 reliable event publishing. 언제 X: 매 simple CRUD without history, 매 strong consistency snapshot only, 매 storage cost-sensitive (logs grow).

안티패턴

  • Mutating past events: 매 invariant violation. 매 compensating event 의 emit.
  • Unbounded retention without compaction: 매 storage explosion.
  • Synchronous replay on every read: 매 latency. 매 snapshot + tail.
  • Single-partition Kafka topic: 매 throughput cap. 매 partition by key.

🧪 검증 / 중복

  • Verified (Jay Kreps "The Log" 2013, Kafka docs, Postgres WAL docs, Greg Young event sourcing).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full content (Kafka, event sourcing, WAL, outbox)