id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id
title
category
status
canonical_id
aliases
duplicate_of
source_trust_level
confidence_score
verification_status
tags
raw_sources
last_reinforced
github_commit
tech_stack
wiki-2026-0508-stream-processing-architectures
Stream Processing Architectures
10_Wiki/Topics
verified
self
Stream Processing
Streaming Systems
Real-time Data Processing
none
A
0.9
applied
stream-processing
kafka
flink
architecture
2026-05-10
pending
language
framework
java
kafka-streams-flink
Stream Processing Architectures
매 한 줄
"매 unbounded data 의 continuous compute" . 매 batch 의 finite data 의 처리와 달리 매 stream 의 무한 event flow 의 sub-second latency 의 처리. 2026 의 standard stack 의 Kafka + Flink + Iceberg 의 lakehouse streaming.
매 핵심
매 Stream vs Batch
Batch : bounded, high throughput, hours latency (Spark, Hadoop).
Stream : unbounded, lower throughput, ms-sec latency (Flink, Kafka Streams).
Unified : 매 single API 의 batch + stream (Flink Table API, Beam).
매 Processing semantics
At-most-once : drop on failure (low latency, lossy).
At-least-once : retry (duplicates possible).
Exactly-once : 매 idempotent + transactional (Kafka EOS, Flink checkpoints).
매 Time semantics
Event time : 매 sensor emit 시각 (correct but late).
Processing time : 매 system clock 시각 (fast but wrong on lag).
Watermark : 매 event time 의 progress marker — 매 late event 의 cutoff.
매 응용
Real-time fraud detection (sub-100ms decision).
Trading / market data aggregation.
CDC pipelines (Debezium → Kafka → Flink → warehouse).
IoT telemetry (sensor → MQTT → stream proc).
💻 패턴
Kafka Streams — windowed aggregation
Flink — event-time + watermark
Flink SQL — streaming join
Stateful processing — Flink ProcessFunction
Exactly-once with Kafka transactions
Backpressure — Flink credit-based flow control
Lakehouse streaming sink — Iceberg
매 결정 기준
상황
Approach
Simple ETL, Kafka-native
Kafka Streams
Complex CEP, large state
Flink
Unified batch+stream
Flink / Beam
SQL-only team
Flink SQL / ksqlDB
Tiny scale
Single consumer + handler
기본값 : 매 Kafka + Flink — 매 production-grade exactly-once streaming.
🔗 Graph
🤖 LLM 활용
언제 : continuous unbounded data, sub-second latency, stateful aggregation.
언제 X : hourly/daily batch (use Spark), tiny volumes (use cron).
❌ 안티패턴
Processing-time on lagged sources : 매 watermark/event-time 의 사용.
Unbounded state : 매 TTL 의 set — state 의 무한 grow 의 OOM.
Single-partition hot key : 매 skew 의 partition rebalance.
Sync external call in operator : 매 AsyncIO 의 사용.
🧪 검증 / 중복
Verified (Apache Flink docs, Kafka Streams Developer Guide 2026).
신뢰도 A.
🕓 Changelog
날짜
변경
2026-05-08
Phase 1
2026-05-10
Manual cleanup — Kafka Streams + Flink patterns, EOS, watermarks