--- id: wiki-2026-0508-event-stream-processing title: Event Stream Processing category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Stream Processing, ESP, Streaming Analytics] duplicate_of: none source_trust_level: A confidence_score: 0.93 verification_status: applied tags: [streaming, kafka, flink, real-time, data-engineering] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: polyglot framework: kafka-flink --- # Event Stream Processing ## 매 한 줄 > **"매 unbounded data 의 매 record-by-record 의 transform"**. 매 batch 의 ETL 의 opposite — 매 event 의 arrive 시점 에 immediately compute. 매 2026 의 매 dominant stack 의 Kafka + Flink (or Kafka Streams), 매 emerging 의 RisingWave / Materialize (streaming SQL DB), 매 cloud-native 의 Confluent Cloud / AWS Kinesis / GCP Dataflow. ## 매 핵심 ### 매 batch vs streaming - **Batch**: 매 hourly/daily, 매 high latency, 매 reprocess easy. - **Streaming**: 매 sub-second, 매 always-on, 매 stateful. - **Lambda architecture**: batch + streaming hybrid (매 deprecated 2026). - **Kappa architecture**: streaming-only, 매 replay 의 reprocess. ### 매 핵심 개념 - **Event time vs processing time** — 매 event 의 produce 시점 vs broker 의 receive. - **Watermark** — 매 "event time T 까지의 모든 event 의 도착 의 expect" 의 signal. - **Window** — tumbling / sliding / session. - **Exactly-once semantics** — Kafka transactions + Flink checkpoints. - **Stateful operator** — 매 RocksDB / state backend. ### 매 frameworks (2026) - **Apache Flink 1.20** — 매 most powerful, 매 unified batch+streaming, 매 exactly-once. - **Kafka Streams 3.7** — 매 JVM 만, 매 library (no cluster). - **Apache Beam** — 매 portable runner (Flink/Dataflow/Spark). - **RisingWave / Materialize** — 매 streaming SQL DB, 매 incremental view maintenance. - **Bytewax** — 매 Python-native, 매 Rust core. ### 매 응용 1. Real-time fraud detection. 2. IoT telemetry aggregation. 3. Live dashboards (clickstream). 4. CDC → search index sync. 5. AI feature stores (real-time). ## 💻 패턴 ### Pattern 1: Kafka producer (TS) ```typescript import { Kafka } from "kafkajs"; const kafka = new Kafka({ brokers: ["broker:9092"] }); const producer = kafka.producer({ idempotent: true, maxInFlightRequests: 5 }); await producer.connect(); await producer.send({ topic: "orders", messages: [{ key: order.id, value: JSON.stringify(order), headers: { "event-time": Date.now().toString() }, }], }); ``` ### Pattern 2: Flink DataStream (Java) ```java DataStream orders = env.fromSource( KafkaSource.builder() .setBootstrapServers("broker:9092") .setTopics("orders") .setDeserializer(new OrderDeserializer()) .build(), WatermarkStrategy .forBoundedOutOfOrderness(Duration.ofSeconds(5)) .withTimestampAssigner((o, ts) -> o.eventTime), "kafka-source"); orders .keyBy(o -> o.userId) .window(TumblingEventTimeWindows.of(Time.minutes(1))) .aggregate(new OrderCountAgg()) .sinkTo(new MetricsSink()); ``` ### Pattern 3: Kafka Streams aggregation ```java StreamsBuilder builder = new StreamsBuilder(); KStream orders = builder.stream("orders"); orders .groupBy((k, v) -> v.userId) .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5))) .count(Materialized.as("user-order-count")) .toStream() .to("user-order-counts"); ``` ### Pattern 4: Streaming SQL (Flink SQL / RisingWave) ```sql -- 매 RisingWave / Flink SQL 동일 CREATE SOURCE orders ( order_id VARCHAR, user_id VARCHAR, amount DOUBLE, event_time TIMESTAMP, WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND ) WITH (connector='kafka', topic='orders', properties.bootstrap.servers='broker:9092') FORMAT JSON; CREATE MATERIALIZED VIEW user_revenue_5m AS SELECT user_id, window_start, window_end, SUM(amount) AS revenue FROM TUMBLE(orders, event_time, INTERVAL '5' MINUTES) GROUP BY user_id, window_start, window_end; ``` ### Pattern 5: Bytewax (Python streaming) ```python from bytewax.dataflow import Dataflow from bytewax.connectors.kafka import KafkaSource, KafkaSink from bytewax import operators as op flow = Dataflow("fraud-detect") src = op.input("kafka-in", flow, KafkaSource(brokers=["broker:9092"], topics=["tx"])) parsed = op.map("parse", src, lambda kv: json.loads(kv.value)) flagged = op.filter("suspicious", parsed, lambda tx: tx["amount"] > 10_000) op.output("kafka-out", flagged, KafkaSink(brokers=["broker:9092"], topic="alerts")) ``` ### Pattern 6: CDC stream (Debezium → Kafka → Flink) ```yaml # Debezium connector config connector.class: io.debezium.connector.postgresql.PostgresConnector database.hostname: pg database.dbname: app plugin.name: pgoutput table.include.list: public.orders topic.prefix: cdc ``` ### Pattern 7: Exactly-once sink (Flink) ```java KafkaSink sink = KafkaSink.builder() .setBootstrapServers("broker:9092") .setRecordSerializer(KafkaRecordSerializationSchema.builder() .setTopic("processed-orders") .setValueSerializationSchema(new OrderSerializer()) .build()) .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE) .setTransactionalIdPrefix("orders-") .build(); ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Sub-second latency, JVM team | Flink | | Lightweight, library-only | Kafka Streams | | Python team, simple jobs | Bytewax | | SQL-only, fast iteration | RisingWave / Materialize | | Multi-runtime portability | Apache Beam | | Cloud-managed | Confluent / Dataflow / Kinesis | **기본값**: 매 Kafka + Flink (전통 stack), 매 SQL-only 시 RisingWave. ## 🔗 Graph - 부모: [[Data Engineering]] · [[Distributed Systems]] - 응용: [[Feature Store]] - Adjacent: [[Apache Flink]] · [[Event Sourcing]] · [[CQRS]] ## 🤖 LLM 활용 **언제**: 매 real-time pipeline 설계, 매 streaming SQL 의 generate. **언제 X**: 매 daily batch 의 충분, 매 small data 의 cron job. ## ❌ 안티패턴 - **No watermark**: 매 late event 의 silently drop or wrong window. - **At-least-once + idempotent missing**: 매 duplicate 의 downstream impact. - **Unbounded state**: 매 keyed state 의 TTL 의 missing — 매 OOM. - **Reorder reliance**: 매 partition 별 only 의 ordering — 매 cross-partition 의 X. - **Synchronous external call inside operator**: 매 backpressure / slow. ## 🧪 검증 / 중복 - Verified (Flink docs, Kafka docs, "Streaming Systems" Akidau et al., RisingWave docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Flink/Kafka Streams/RisingWave + windowing |