[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,67 +1,183 @@
 ---
 id: wiki-2026-0508-real-time-operation
-title: Real time Operation
+title: Real-time Operation
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-REOP-001]
+aliases: [Real-time Systems, RTOS, Real-time Inference]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.94
-tags: [auto-reinforced, real-time-Operation, latency, deterministic, responsiveness, extreme-performance]
+confidence_score: 0.9
+verification_status: applied
+tags: [real-time, latency, rtos, streaming, inference]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: python
+  framework: vllm
 ---

-# [[Real-time-Operation|Real-time-Operation]]
+# Real-time Operation

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "지각이 허용되지 않는 세계: 명령이 내려진 순간부터 결과가 나올 때까지의 시간(Latency)이 찰나로 제어되어야 하는 극한의 환경이자, 단 1ms의 오차로 생사가 갈리는 자율주행이나 산업 로봇의 필수 생존 조건."
+## 매 한 줄
+> **"매 deadline 의 miss 의 failure"**. Real-time 의 fast 와 X — predictable latency budget 의 within. Hard RT (RTOS, avionics) 의 missed deadline 의 catastrophic; soft RT (video, LLM streaming) 의 degraded UX.

-## 📖 구조화된 지식 (Synthesized Content)
-실시간 운영(Real-time-Operation)은 시스템의 반응 속도가 현실 세계의 물리적 시간 제약 내에 완벽히 들어와야 함을 보장하는 운영 방식입니다.
+## 매 핵심

-1.  **Hard Real-time**: 정해진 시간 내에 응답하지 못하면 시스템 전체가 실패(Crash)로 간주되는 치명적 상황. (자율주행, 수술 로봇 등).
-2.  **Soft Real-time**: 지연되면 품질은 떨어지지만 시스템이 멈추지는 않는 상황 (동영상 스트리밍 등).
-3.  **핵심 기술 요소**:
-    *   **Deterministic Scheduling**: 다음 작업이 언제 실행될지 100% 예측 가능해야 함.
-    *   **Interrupt Handling**: 긴급 상황 발생 시 즉시 현재 작업을 멈추고 반응. ([[Fault-Tolerance|Fault-Tolerance]]와 연결)
-4.  **왜 중요한가?**:
-    *   AI가 현실 세계로 튀어나와 인간과 협업(Physical Intelligence)하기 위해서는, 인간의 반응 속도보다 빨라야 안전과 신뢰를 확보할 수 있기 때문임.
+### 매 분류
+- **Hard RT**: 매 deadline 의 absolute (pacemaker, ABS brake). RTOS — VxWorks, QNX, Zephyr.
+- **Firm RT**: 매 occasional miss 의 OK but useless after deadline (live video frame).
+- **Soft RT**: 매 best-effort, degraded quality on miss (LLM token stream, web UI).

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 연산 성능 부족으로 실시간 처리가 '불가능의 영역 정책'이었으나, 현대 정책은 특수 하드웨어(ASIC, FPGA) 정책과 실시간 OS(RTOS) 정책을 통해 이를 극복함(RL Update).
- **정책 변화(RL Update)**: 클라우드 연산 정책의 지연 시간을 없애기 위해 기기 자체에서 AI를 돌리는 '온디바이스 실시간 추론 정책'이 차세대 AI 운영의 핵심 정책이 됨. ([[Quantization|Quantization]]와 연결)
+### 매 Latency budgets
+- **HFT**: <10μs.
+- **Game frame (60fps)**: 16.6ms.
+- **VR frame (90fps)**: 11ms (motion-to-photon <20ms).
+- **Web TTI**: <200ms perceived instant.
+- **LLM TTFT**: <500ms (Claude Opus 4.7 streaming).
+- **LLM inter-token**: <50ms (20 tok/s minimum readable).

-## 🔗 지식 연결 (Graph)
- [[Physical-Intelligence|Physical-Intelligence]], [[Fault-Tolerance|Fault-Tolerance]], [[Efficiency|Efficiency]], [[Hardware|Hardware]], [[Quantization|Quantization]]
- **Modern Tech/Tools**: RTOS (FreeRTOS, QNX), EtherCAT, Edge AI (NVIDIA Jetson).
---
+### 매 Web real-time
+- **SSE**: 매 server-push, HTTP/1.1 + 2, simple. LLM streaming default.
+- **WebSocket**: bidirectional, binary OK. Chat, multiplayer.
+- **WebRTC**: 매 P2P, sub-100ms voice/video.
+- **HTTP/3 + WebTransport**: 매 2026 emerging — UDP-based, multiplexed.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 AI Real-time inference
+- **vLLM**: PagedAttention — 매 24x throughput vs naive.
+- **MLX (Apple Silicon)**: M3/M4 의 unified memory — Llama 3.x 70B 의 local realtime.
+- **Speculative decoding**: small draft model 의 2-3x speedup.
+- **KV cache**: 매 prefix sharing — system prompt 의 cache.
+- **Prompt caching (Anthropic)**: 매 90% cost cut, lower TTFT.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+### 매 응용
+1. LLM chat 의 streaming token-by-token.
+2. Video conferencing (WebRTC).
+3. Trading systems (kdb+, FPGA).
+4. Robotics control loop (ROS 2 + Zephyr).
+5. Live captioning (Whisper streaming).

-**언제 쓰면 안 되는가:**
- *(TODO)*
+## 💻 패턴

-## 🧪 검증 상태 (Validation)
+### LLM streaming with prompt cache
+```python
+from anthropic import Anthropic

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+client = Anthropic()

-## 🧬 중복 검사 (Duplicate Check)
+with client.messages.stream(
+    model="claude-opus-4-7",
+    max_tokens=2048,
+    system=[{
+        "type": "text",
+        "text": LARGE_SYSTEM_PROMPT,  # 10k+ tokens
+        "cache_control": {"type": "ephemeral"},
+    }],
+    messages=[{"role": "user", "content": "..."}],
+) as stream:
+    for text in stream.text_stream:
+        print(text, end="", flush=True)
+```

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+### SSE in FastAPI
+```python
+from fastapi import FastAPI
+from fastapi.responses import StreamingResponse
+import asyncio

-## 🕓 변경 이력 (Changelog)
+app = FastAPI()

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+async def event_stream():
+    for i in range(100):
+        yield f"data: token {i}\n\n"
+        await asyncio.sleep(0.05)
+
+@app.get("/stream")
+async def stream():
+    return StreamingResponse(event_stream(), media_type="text/event-stream")
+```
+
+### vLLM batched inference server
+```python
+from vllm import LLM, SamplingParams
+
+llm = LLM(model="meta-llama/Llama-3.3-70B", tensor_parallel_size=4,
+          enable_prefix_caching=True)
+params = SamplingParams(max_tokens=512, temperature=0.7)
+
+# Continuous batching — 매 새 request 의 mid-batch 의 join.
+outputs = llm.generate(prompts, params)
+```
+
+### Game loop (fixed timestep)
+```rust
+const DT: f32 = 1.0 / 60.0;
+let mut acc = 0.0;
+let mut last = Instant::now();
+
+loop {
+    let now = Instant::now();
+    acc += (now - last).as_secs_f32();
+    last = now;
+    while acc >= DT {
+        physics_step(DT);
+        acc -= DT;
+    }
+    render(acc / DT); // interpolate
+}
+```
+
+### RTOS task (Zephyr)
+```c
+K_THREAD_DEFINE(ctrl_tid, 1024, control_loop, NULL, NULL, NULL,
+                K_PRIO_PREEMPT(2), 0, 0);
+
+void control_loop(void *p1, void *p2, void *p3) {
+    while (1) {
+        read_sensors();
+        compute_pid();
+        actuate();
+        k_sleep(K_MSEC(10));  // 100Hz hard deadline
+    }
+}
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Safety-critical (medical, auto) | Hard RT — RTOS, formal verification |
+| LLM chat | SSE streaming + prompt cache |
+| Multiplayer game | UDP + WebRTC / custom protocol |
+| Voice/video call | WebRTC |
+| HFT | Kernel bypass (DPDK), FPGA |
+| Robotics | ROS 2 + Zephyr/PREEMPT_RT Linux |
+
+**기본값**: SSE + Anthropic streaming for LLM, WebSocket for bidirectional chat.
+
+## 🔗 Graph
+- 부모: [[Distributed-Systems]] · [[Operating-Systems]]
+- 변형: [[Hard-Real-Time]] · [[Soft-Real-Time]] · [[Streaming]]
+- 응용: [[LLM-Streaming]] · [[WebRTC]] · [[Game-Loop]]
+- Adjacent: [[Latency-Optimization]] · [[Prompt-Caching]] · [[vLLM]]
+
+## 🤖 LLM 활용
+**언제**: 매 user-facing chat (TTFT < 500ms), 매 long-output (token streaming UX), tool-use loops.
+**언제 X**: batch processing (use Batch API — 50% cheaper), embeddings (single-shot), latency-insensitive analytics.
+
+## ❌ 안티패턴
+- **Block on full response**: 매 user 의 spinner 의 30s — 매 stream 의 use.
+- **Soft RT 의 hard guarantees claim**: Linux + GC 의 hard RT X.
+- **No timeout**: hung connection 의 leak — `httpx.Timeout(30.0, connect=5.0)`.
+- **No backpressure**: producer 의 consumer 의 outpace → OOM.
+- **Synchronous in event loop**: `time.sleep` 의 asyncio 의 block.
+
+## 🧪 검증 / 중복
+- Verified (vLLM docs, Anthropic streaming API, WebRTC RFC, Zephyr docs).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — RT systems + web streaming + LLM inference unified |