"매 edge computing 의 핵심: compute moves to data — latency + bandwidth + privacy". 매 2010s CDN edge → 매 2020s function-edge (Cloudflare Workers, Deno Deploy, AWS Lambda@Edge), 매 2022 edge AI inference, 매 2024 5G MEC. 매 2026 현재 phone (Apple Intelligence, Pixel Gemini Nano) → CDN edge (Workers AI) → micro-DC 의 3-tier edge stack 의 일반화.
매 핵심
매 edge spectrum
Device edge: 매 phone, sensor, MCU, robot.
Near edge / On-prem: 매 factory floor, retail store K8s.
Far edge / CDN: 매 Cloudflare, Fastly, Akamai PoPs.
MEC (Mobile Edge Computing): 매 5G base station co-located.
매 drivers
Latency: 매 <10ms — 매 cloud round-trip 의 불가능.
Bandwidth: 매 video / sensor stream의 cloud 전송 cost.
Privacy / sovereignty: 매 data 의 region / device 외 미반출.
Resilience: 매 offline operation.
Cost: 매 egress fee 회피.
매 응용
Real-time vision (autonomous driving, AR).
Voice assistants (Siri on-device wake-word).
Industrial control (PLC + edge AI).
Game streaming, low-latency RTC.
Edge AI inference (Llama 3.2 on phone, Whisper on Pi).
💻 패턴
Cloudflare Workers (function edge)
exportdefault{asyncfetch(req: Request,env: Env):Promise<Response>{constcache=caches.default;constcached=awaitcache.match(req);if(cached)returncached;constdata=awaitenv.DB.prepare("SELECT * FROM posts LIMIT 10").all();constres=newResponse(JSON.stringify(data),{headers:{"cache-control":"max-age=60"}});awaitcache.put(req,res.clone());returnres;}};
# Edge gateway aggregates sensors, forwards summaries to cloudimportpaho.mqtt.clientasmqttdefon_local(client,_,msg):summary=aggregate(msg.payload)cloud.publish("plant/summary",summary)local=mqtt.Client();local.connect("localhost");local.on_message=on_locallocal.subscribe("sensors/#");local.loop_start()cloud=mqtt.Client();cloud.connect("cloud.broker.com")
CRDT-based offline-first sync
import*asYfrom"yjs";import{WebsocketProvider}from"y-websocket";constdoc=newY.Doc();// Works offline, merges automatically when reconnected
newWebsocketProvider("wss://sync.edge.local","doc1",doc);
Edge LLM with quantization (mobile)
# Convert Llama 3.2 3B to 4-bit GGUF for mobile# llama.cpp or MLXfrommlx_lmimportconvertconvert("meta-llama/Llama-3.2-3B-Instruct",mlx_path="llama-3b-q4",quantize=True,q_bits=4)
매 결정 기준
상황
Approach
Static asset / API cache
CDN (Cloudflare, Fastly)
Low-latency API
Workers / Deno Deploy
Stateful edge service
K3s on near-edge
Sensor / IoT
MQTT + edge gateway
Edge AI inference
ONNX Runtime / TFLite / Core ML
Cross-platform portable
Wasm (WasmEdge, Spin)
Personal AI
On-device LLM (MLX, llama.cpp)
기본값: 매 latency-sensitive read 은 CDN edge, 매 sensor data 의 edge gateway 에서 aggregate.
언제: 매 edge deployment scaffolding (K3s manifests, Workers code), 매 quantization workflow, 매 latency-budget reasoning.
언제 X: 매 hard-real-time (<1ms) — 매 LLM 의지 X. 매 deterministic timing 의 RTOS 사용.
❌ 안티패턴
Edge as cloud-with-extra-steps: 매 unnecessary 사용 — 매 latency / privacy 명확한 driver 없으면 cloud.
No degradation strategy: 매 edge offline → 전체 fail.
State at every edge: 매 consistency nightmare — 매 CRDT / eventual 의 사용.
Same model size as cloud: 매 device OOM — 매 quantize + distill.
Ignoring network partitions: 매 split-brain → corrupted state.
Pushing all logic to client: 매 trust boundary violation.
🧪 검증 / 중복
Verified (Cloudflare Workers docs 2025, K3s docs, ETSI MEC spec, ONNX Runtime mobile docs, Llama 3.2 release notes 2024).