f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
184 lines
5.5 KiB
Markdown
184 lines
5.5 KiB
Markdown
---
|
|
id: wiki-2026-0508-real-time-operation
|
|
title: Real-time Operation
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Real-time Systems, RTOS, Real-time Inference]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [real-time, latency, rtos, streaming, inference]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: vllm
|
|
---
|
|
|
|
# Real-time Operation
|
|
|
|
## 매 한 줄
|
|
> **"매 deadline 의 miss 의 failure"**. Real-time 의 fast 와 X — predictable latency budget 의 within. Hard RT (RTOS, avionics) 의 missed deadline 의 catastrophic; soft RT (video, LLM streaming) 의 degraded UX.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 분류
|
|
- **Hard RT**: 매 deadline 의 absolute (pacemaker, ABS brake). RTOS — VxWorks, QNX, Zephyr.
|
|
- **Firm RT**: 매 occasional miss 의 OK but useless after deadline (live video frame).
|
|
- **Soft RT**: 매 best-effort, degraded quality on miss (LLM token stream, web UI).
|
|
|
|
### 매 Latency budgets
|
|
- **HFT**: <10μs.
|
|
- **Game frame (60fps)**: 16.6ms.
|
|
- **VR frame (90fps)**: 11ms (motion-to-photon <20ms).
|
|
- **Web TTI**: <200ms perceived instant.
|
|
- **LLM TTFT**: <500ms (Claude Opus 4.7 streaming).
|
|
- **LLM inter-token**: <50ms (20 tok/s minimum readable).
|
|
|
|
### 매 Web real-time
|
|
- **SSE**: 매 server-push, HTTP/1.1 + 2, simple. LLM streaming default.
|
|
- **WebSocket**: bidirectional, binary OK. Chat, multiplayer.
|
|
- **WebRTC**: 매 P2P, sub-100ms voice/video.
|
|
- **HTTP/3 + WebTransport**: 매 2026 emerging — UDP-based, multiplexed.
|
|
|
|
### 매 AI Real-time inference
|
|
- **vLLM**: PagedAttention — 매 24x throughput vs naive.
|
|
- **MLX (Apple Silicon)**: M3/M4 의 unified memory — Llama 3.x 70B 의 local realtime.
|
|
- **Speculative decoding**: small draft model 의 2-3x speedup.
|
|
- **KV cache**: 매 prefix sharing — system prompt 의 cache.
|
|
- **Prompt caching (Anthropic)**: 매 90% cost cut, lower TTFT.
|
|
|
|
### 매 응용
|
|
1. LLM chat 의 streaming token-by-token.
|
|
2. Video conferencing (WebRTC).
|
|
3. Trading systems (kdb+, FPGA).
|
|
4. Robotics control loop (ROS 2 + Zephyr).
|
|
5. Live captioning (Whisper streaming).
|
|
|
|
## 💻 패턴
|
|
|
|
### LLM streaming with prompt cache
|
|
```python
|
|
from anthropic import Anthropic
|
|
|
|
client = Anthropic()
|
|
|
|
with client.messages.stream(
|
|
model="claude-opus-4-7",
|
|
max_tokens=2048,
|
|
system=[{
|
|
"type": "text",
|
|
"text": LARGE_SYSTEM_PROMPT, # 10k+ tokens
|
|
"cache_control": {"type": "ephemeral"},
|
|
}],
|
|
messages=[{"role": "user", "content": "..."}],
|
|
) as stream:
|
|
for text in stream.text_stream:
|
|
print(text, end="", flush=True)
|
|
```
|
|
|
|
### SSE in FastAPI
|
|
```python
|
|
from fastapi import FastAPI
|
|
from fastapi.responses import StreamingResponse
|
|
import asyncio
|
|
|
|
app = FastAPI()
|
|
|
|
async def event_stream():
|
|
for i in range(100):
|
|
yield f"data: token {i}\n\n"
|
|
await asyncio.sleep(0.05)
|
|
|
|
@app.get("/stream")
|
|
async def stream():
|
|
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
|
```
|
|
|
|
### vLLM batched inference server
|
|
```python
|
|
from vllm import LLM, SamplingParams
|
|
|
|
llm = LLM(model="meta-llama/Llama-3.3-70B", tensor_parallel_size=4,
|
|
enable_prefix_caching=True)
|
|
params = SamplingParams(max_tokens=512, temperature=0.7)
|
|
|
|
# Continuous batching — 매 새 request 의 mid-batch 의 join.
|
|
outputs = llm.generate(prompts, params)
|
|
```
|
|
|
|
### Game loop (fixed timestep)
|
|
```rust
|
|
const DT: f32 = 1.0 / 60.0;
|
|
let mut acc = 0.0;
|
|
let mut last = Instant::now();
|
|
|
|
loop {
|
|
let now = Instant::now();
|
|
acc += (now - last).as_secs_f32();
|
|
last = now;
|
|
while acc >= DT {
|
|
physics_step(DT);
|
|
acc -= DT;
|
|
}
|
|
render(acc / DT); // interpolate
|
|
}
|
|
```
|
|
|
|
### RTOS task (Zephyr)
|
|
```c
|
|
K_THREAD_DEFINE(ctrl_tid, 1024, control_loop, NULL, NULL, NULL,
|
|
K_PRIO_PREEMPT(2), 0, 0);
|
|
|
|
void control_loop(void *p1, void *p2, void *p3) {
|
|
while (1) {
|
|
read_sensors();
|
|
compute_pid();
|
|
actuate();
|
|
k_sleep(K_MSEC(10)); // 100Hz hard deadline
|
|
}
|
|
}
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Safety-critical (medical, auto) | Hard RT — RTOS, formal verification |
|
|
| LLM chat | SSE streaming + prompt cache |
|
|
| Multiplayer game | UDP + WebRTC / custom protocol |
|
|
| Voice/video call | WebRTC |
|
|
| HFT | Kernel bypass (DPDK), FPGA |
|
|
| Robotics | ROS 2 + Zephyr/PREEMPT_RT Linux |
|
|
|
|
**기본값**: SSE + Anthropic streaming for LLM, WebSocket for bidirectional chat.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Distributed-Systems]]
|
|
- 변형: [[Streaming]]
|
|
- 응용: [[WebRTC]] · [[Game-Loop]]
|
|
- Adjacent: [[Latency-Optimization]] · [[LLM_Optimization_and_Deployment_Strategies|vLLM]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 user-facing chat (TTFT < 500ms), 매 long-output (token streaming UX), tool-use loops.
|
|
**언제 X**: batch processing (use Batch API — 50% cheaper), embeddings (single-shot), latency-insensitive analytics.
|
|
|
|
## ❌ 안티패턴
|
|
- **Block on full response**: 매 user 의 spinner 의 30s — 매 stream 의 use.
|
|
- **Soft RT 의 hard guarantees claim**: Linux + GC 의 hard RT X.
|
|
- **No timeout**: hung connection 의 leak — `httpx.Timeout(30.0, connect=5.0)`.
|
|
- **No backpressure**: producer 의 consumer 의 outpace → OOM.
|
|
- **Synchronous in event loop**: `time.sleep` 의 asyncio 의 block.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (vLLM docs, Anthropic streaming API, WebRTC RFC, Zephyr docs).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — RT systems + web streaming + LLM inference unified |
|