[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,88 +2,232 @@
 id: wiki-2026-0508-edge-ai-and-computing
 title: Edge AI and Computing
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [EDGE-AI-001]
+aliases: [edge AI, on-device AI, edge computing, TinyML, NPU, mobile inference]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [ai, infrastructure, Edge-Computing, on-device-ai, latency-Optimization]
+confidence_score: 0.95
+verification_status: applied
+tags: [ai, infrastructure, edge-computing, on-device-ai, latency, tinyml, quantization]
 raw_sources: []
-last_reinforced: 2026-04-26
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: C / C++ / Python
+  framework: TFLite / ONNX Runtime / Core ML / NCNN / TinyML
 ---

-# Edge AI and Computing (엣지 AI와 컴퓨팅)
+# Edge AI and Computing

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "데이터가 태어나는 그곳에서 지능을 즉시 실행하라" — 클라우드 서버에 의존하지 않고 사용자의 단말기(스마트폰, IoT 기기, 로봇 등)에서 직접 AI 모델을 실행하여 지연 시간을 줄이고 프라이버시를 보호하는 기술.
+## 매 한 줄
+> **"매 cloud 의 X — 매 device 의 inference"**. 매 latency ↓ + 매 privacy ↑ + 매 bandwidth ↓ + 매 offline. 매 model: 매 quantized + pruned + distilled. 매 hardware: 매 NPU (Apple Neural Engine, Snapdragon Hexagon), TinyML MCU. 매 modern: 매 on-device LLM (Phi-3, Llama 3.2 1B, Gemma 2B).

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** 대역폭(Bandwidth) 한계와 보안 리스크를 극복하기 위해, 중앙 집중식 연산을 분산된 단말기로 전이시키고 필요한 정보만 요약하여 전송하는 분산 지능 패턴.
- **핵심 기술:**
-    - **Model Compression:** 양자화([[Quantization|Quantization]]), 프루닝(Pruning), 지식 증류([[Distillation|Distillation]]) 등을 통해 모델 크기 축소.
-    - **NPU (Neural [[Processing|Processing]] Unit):** 모바일 기기에 최적화된 AI 전용 하드웨어 가속기.
-    - **On-device Learning:** 서버 연결 없이 기기 내부 데이터로 모델을 미세 조정.
- **장점:** 초저지연 응답(자율주행, 게임 등), 오프라인 작동 가능, 데이터 유출 방지, 서버 비용 절감.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** 성능이 부족한 엣지 기기는 단순 수집만 해야 한다는 고정관념에서 벗어나, 강력한 모바일 프로세서의 발전으로 서빙과 학습이 가능한 '지능형 엣지' 시대로 진입.
- **정책 변화:** ConnectAI 프로젝트는 로컬 LLM 엔드포인트를 활용한 '로컬 브레인' 전략을 통해, 사용자의 코드가 외부로 유출되지 않는 Edge AI 지향적 아키텍처를 추구함.
+### 매 motivation
+- **Latency**: 매 ms 의 round-trip cloud → 매 sub-ms.
+- **Privacy**: 매 data 의 device 의 stay.
+- **Cost**: 매 cloud GPU 의 X.
+- **Offline**: 매 connectivity 의 independent.

-## 🔗 지식 연결 (Graph)
-[[_system|system]]-Design-for-AI-Scale, Data-Ethics-and-Privacy, [[Federated-Learning|Federated-Learning]], [[Distributed-Computing|Distributed-Computing]]
- **Raw Source:** 10_Wiki/Topics/AI/Edge-AI-and-Computing.md
+### 매 model compression
+- **Quantization**: FP32 → INT8 → INT4 → 4-bit.
+- **Pruning**: 매 zero weights.
+- **Distillation**: 매 teacher → student.
+- **Architecture**: 매 MobileNet, EfficientNet, MobileViT.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 hardware
+- **Apple Neural Engine** (16-core).
+- **Snapdragon Hexagon NPU**.
+- **Google Tensor TPU edge**.
+- **NVIDIA Jetson** (Orin, Xavier).
+- **TinyML MCU**: ESP32, Cortex-M.
+- **Coral Edge TPU**.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+### 매 framework
+- **Mobile**: TFLite, Core ML, MediaPipe, ONNX Runtime Mobile.
+- **Embedded**: TensorFlow Lite Micro, Edge Impulse, NCNN, MNN.
+- **LLM-on-device**: llama.cpp, MLC LLM, Apple Foundation Models, MLX.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### 매 응용
+1. **Mobile photo**: 매 portrait, HDR.
+2. **Voice**: 매 wake word, ASR.
+3. **AR**: 매 hand tracking.
+4. **IoT**: 매 anomaly.
+5. **Automotive**: 매 ADAS.
+6. **Wearable**: 매 ECG, sleep.

-## 🧪 검증 상태 (Validation)
+### 매 modern (2024+)
+- **On-device LLM**: 매 Phi-3-mini (3.8B INT4), Llama 3.2 1B/3B, Gemma 2B.
+- **Apple Foundation Models** framework.
+- **Qualcomm AI Hub**.
+- **Hybrid edge-cloud**: 매 simple → device, complex → cloud.

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+## 💻 패턴

-## 🧬 중복 검사 (Duplicate Check)
+### TFLite quantize (Python)
+```python
+import tensorflow as tf

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/')
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
+converter.representative_dataset = lambda: ((tf.cast(x[:1], tf.float32),) for x in calibration_data)
+converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
+converter.inference_input_type = tf.int8
+converter.inference_output_type = tf.int8
+tflite_model = converter.convert()
+open('model_int8.tflite', 'wb').write(tflite_model)
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### Mobile inference (Core ML, Swift)
+```swift
+import CoreML
+let config = MLModelConfiguration()
+config.computeUnits = .all  // 매 ANE + GPU + CPU
+let model = try MyModel(configuration: config)
+let prediction = try model.prediction(input: input)
+```

-**선택 A를 써야 할 때:**
- *(TODO)*
+### Android (TFLite + delegate)
+```kotlin
+val options = Interpreter.Options().apply {
+    addDelegate(NnApiDelegate())  // 매 NPU
+    setNumThreads(4)
+}
+val interpreter = Interpreter(modelFile, options)
+interpreter.run(input, output)
+```

-**선택 B를 써야 할 때:**
- *(TODO)*
+### llama.cpp (LLM on-device)
+```cpp
+#include "llama.h"
+struct llama_model_params model_params = llama_model_default_params();
+model_params.n_gpu_layers = 99;  // 매 Metal / CUDA
+struct llama_model* model = llama_load_model_from_file("model_q4.gguf", model_params);
+struct llama_context_params ctx_params = llama_context_default_params();
+struct llama_context* ctx = llama_new_context_with_model(model, ctx_params);
+```

-**기본값:**
-> *(TODO)*
+### MLX (Apple, Python)
+```python
+import mlx.core as mx
+import mlx_lm

-## ❌ 안티패턴 (Anti-Patterns)
+model, tokenizer = mlx_lm.load('mlx-community/Llama-3.2-1B-Instruct-4bit')
+prompt = tokenizer.apply_chat_template([{'role': 'user', 'content': 'Hi'}], tokenize=False)
+response = mlx_lm.generate(model, tokenizer, prompt, max_tokens=128)
+```

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+### TinyML (TF Lite Micro, C)
+```c
+#include "tensorflow/lite/micro/all_ops_resolver.h"
+constexpr int kArenaSize = 8 * 1024;
+uint8_t tensor_arena[kArenaSize];
+
+void setup() {
+  static tflite::MicroErrorReporter error_reporter;
+  static tflite::AllOpsResolver resolver;
+  static tflite::MicroInterpreter interpreter(model, resolver, tensor_arena, kArenaSize);
+  interpreter.AllocateTensors();
+}
+
+void loop() {
+  TfLiteTensor* input = interpreter.input(0);
+  // 매 fill from sensor
+  interpreter.Invoke();
+  TfLiteTensor* output = interpreter.output(0);
+}
+```
+
+### Distillation (PyTorch)
+```python
+def distill_loss(student_logits, teacher_logits, target, T=3, alpha=0.5):
+    soft = F.kl_div(F.log_softmax(student_logits / T, -1),
+                    F.softmax(teacher_logits / T, -1), reduction='batchmean') * T * T
+    hard = F.cross_entropy(student_logits, target)
+    return alpha * soft + (1 - alpha) * hard
+```
+
+### Pruning (PyTorch)
+```python
+import torch.nn.utils.prune as prune
+for module in model.modules():
+    if isinstance(module, nn.Linear):
+        prune.l1_unstructured(module, 'weight', amount=0.5)
+        prune.remove(module, 'weight')
+```
+
+### Hybrid edge-cloud
+```python
+def smart_dispatch(query, device_capability):
+    complexity = estimate_complexity(query)
+    if complexity < THRESHOLD and device_capability.has_local_llm:
+        return local_llm.generate(query)
+    return cloud_llm.generate(query)
+```
+
+### Power-aware scheduling
+```cpp
+void schedule_inference() {
+    if (battery_level < 0.2) {
+        use_quantized_model();  // 매 INT8
+        skip_low_priority_inferences();
+    } else {
+        use_full_model();
+    }
+}
+```
+
+### Latency benchmark
+```python
+import time
+def benchmark_tflite(interpreter, input_data, n=100):
+    times = []
+    for _ in range(n):
+        t0 = time.perf_counter()
+        interpreter.set_tensor(input_idx, input_data)
+        interpreter.invoke()
+        times.append(time.perf_counter() - t0)
+    return {'p50': sorted(times)[n // 2], 'p99': sorted(times)[int(n * 0.99)]}
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Mobile vision | TFLite + NNAPI / Core ML |
+| Mobile LLM | MLX / llama.cpp / MLC |
+| MCU (mW power) | TinyML / TF Lite Micro |
+| Privacy critical | On-device only |
+| Latency critical | Edge + 매 cache |
+| Variable complexity | Hybrid edge-cloud |
+
+**기본값**: 매 INT8 quantize + 매 NPU delegate + 매 device-tier model + 매 hybrid 의 fallback.
+
+## 🔗 Graph
+- 부모: [[Machine-Learning]] · [[Distributed-Systems]]
+- 변형: [[TinyML]] · [[On-Device-LLM]] · [[Mobile-AI]]
+- 응용: [[Quantization]] · [[Knowledge-Distillation]] · [[Pruning]]
+- Adjacent: [[Apple-Neural-Engine]] · [[NPU]] · [[Federated-Learning]]
+
+## 🤖 LLM 활용
+**언제**: 매 mobile app. 매 IoT. 매 privacy. 매 low-latency.
+**언제 X**: 매 huge model only. 매 frequent retrain.
+
+## ❌ 안티패턴
+- **Cloud model 의 device 의 push**: 매 ROM / RAM 의 fail.
+- **No quantization**: 매 latency / battery.
+- **Single delegate hardcode**: 매 device 의 fail.
+- **Edge-only stubborn**: 매 hybrid 의 win 의 miss.
+- **No power awareness**: 매 battery drain.
+
+## 🧪 검증 / 중복
+- Verified (TFLite docs, MLX, Apple WWDC, Qualcomm AI Hub).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-04-26 | EDGE-AI auto |
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — quantize + 매 TFLite / MLX / llama.cpp / TinyML / hybrid code |