Files
2nd/10_Wiki/Topics/Data Privacy & Local Processing.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

6.1 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-data-privacy-local-processing Data Privacy & Local Processing 10_Wiki/Topics verified self
On-device AI
Local-First
Privacy-Preserving ML
Edge Privacy
none A 0.9 applied
privacy
local-first
on-device
gdpr
federated-learning
2026-05-10 pending
language framework
Python/Swift/Rust MLX / CoreML / ONNX Runtime / Flower

Data Privacy & Local Processing

매 한 줄

"매 data privacy + local processing 의 핵심: data minimization + on-device inference + cryptographic guarantees". 매 GDPR (2018), CCPA, AI Act (2024 EU) 의 regulatory pressure + 매 Apple Intelligence (2024), Google Gemini Nano (2024), 매 on-device LLM (Llama 3.2 1B/3B, Phi-4 mini, Gemma 3 nano) 의 등장 으로 매 2026 현재 cloud → device shift 가 현실화. 매 Local-First Software 운동 의 main-stream 진입.

매 핵심

매 privacy primitives

  • Data minimization: 매 collect only 필요 — 매 GDPR Art. 5(1)(c).
  • On-device inference: 매 raw data 의 device 외 미전송.
  • Differential Privacy (DP): 매 ε-noise — 매 Apple, Google 의 telemetry 사용.
  • Federated Learning (FL): 매 model 의 device 학습 → gradient aggregate.
  • Homomorphic Encryption (HE): 매 encrypted compute — 매 latency penalty 큼.
  • Secure Enclave (TEE): 매 Apple Secure Enclave, Intel SGX, AWS Nitro.
  • Zero-Knowledge Proof (ZKP): 매 prove without reveal.

매 regulatory landscape (2026)

  • EU AI Act: 매 high-risk system 의 data governance + transparency.
  • GDPR: 매 right to erasure, data portability, DPIA.
  • CCPA / CPRA: 매 California 의 sale opt-out.
  • HIPAA (US health), PIPEDA (Canada), APPI (Japan), PIPL (China — 매 cross-border data transfer 매우 strict).

매 응용

  1. On-device LLM assistant (Apple Intelligence, Pixel Gemini Nano).
  2. Health apps (HealthKit, on-device biometric ML).
  3. Federated keyboard prediction (Gboard, SwiftKey).
  4. Local-first note apps (Obsidian, Anytype, automerge-based).

💻 패턴

On-device LLM with MLX (Apple Silicon)

from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Llama-3.2-3B-Instruct-4bit")
out = generate(model, tokenizer, prompt="Summarize: ...",
               max_tokens=200, verbose=False)
# Data never leaves device

Differential privacy (Opacus)

from opacus import PrivacyEngine

privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model, optimizer=optimizer, data_loader=loader,
    epochs=10, target_epsilon=3.0, target_delta=1e-5, max_grad_norm=1.0,
)

Federated learning (Flower)

import flwr as fl

class Client(fl.client.NumPyClient):
    def get_parameters(self, config): return get_weights(model)
    def fit(self, parameters, config):
        set_weights(model, parameters)
        train_local(model, local_data)
        return get_weights(model), len(local_data), {}

fl.client.start_numpy_client(server_address="server:8080", client=Client())

CoreML on-device inference (iOS / macOS)

import CoreML
let model = try MyModel(configuration: MLModelConfiguration())
let input = try MyModelInput(text: userText)
let output = try model.prediction(input: input)
// Inference never sends data to network

Secure Enclave key wrapping (iOS)

let attrs: [String: Any] = [
  kSecAttrKeyType as String: kSecAttrKeyTypeECSECPrimeRandom,
  kSecAttrKeySizeInBits as String: 256,
  kSecAttrTokenID as String: kSecAttrTokenIDSecureEnclave,
]
var error: Unmanaged<CFError>?
let key = SecKeyCreateRandomKey(attrs as CFDictionary, &error)!

Local-first sync (Yjs / Automerge)

import * as Y from "yjs";
import { IndexeddbPersistence } from "y-indexeddb";

const doc = new Y.Doc();
new IndexeddbPersistence("notes", doc);  // Local persistence
// Optional E2E-encrypted relay for sync

Data redaction before LLM API call (defense in depth)

import re
PII = [r"\b\d{3}-\d{2}-\d{4}\b", r"\b[\w.-]+@[\w.-]+\b"]
def redact(text):
    for p in PII:
        text = re.sub(p, "[REDACTED]", text)
    return text
# Use redact() before sending to remote LLM

매 결정 기준

상황 Approach
Health / financial data On-device only + TEE
Personalized model Federated learning
Aggregate analytics Differential privacy
Multi-party compute HE / MPC (still slow)
Compliance (GDPR / HIPAA) DPIA + minimization + audit log
Personal AI assistant Local LLM (Llama 3.2 3B 4-bit on phone)

기본값: 매 user-content processing 의 default 의 on-device, 매 cloud 의 explicit consent + minimization.

🔗 Graph

🤖 LLM 활용

언제: 매 privacy-impact-assessment drafting, 매 redaction-pipeline scaffolding, 매 GDPR/CCPA compliance checklist generation. 언제 X: 매 actual user PII 의 cloud LLM 의 직접 send X — 매 on-device 또는 redact-first.

안티패턴

  • Plaintext PII to cloud LLM: 매 GDPR violation potential.
  • DP without ε accounting: 매 cumulative leakage 의 무인지.
  • Federated 의 raw gradient leak: 매 gradient inversion attack — 매 secure aggregation 필요.
  • Local-first 의 backup absent: 매 device loss = data loss.
  • "Anonymized" via removing names only: 매 quasi-identifier 의 re-identification.
  • Storing decryption key alongside ciphertext: 매 obvious 하지만 흔한 fail.

🧪 검증 / 중복

  • Verified (GDPR text, NIST Privacy Framework, Apple Differential Privacy white papers, Flower & Opacus docs, EU AI Act 2024).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — privacy primitives + on-device LLM 2026