f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
175 lines
6.1 KiB
Markdown
175 lines
6.1 KiB
Markdown
---
|
|
id: wiki-2026-0508-privacy-preserving-ai
|
|
title: Privacy Preserving AI
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Privacy-Preserving Machine Learning, PPML, Confidential AI]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [privacy, security, differential-privacy, federated-learning, cryptography]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Python
|
|
framework: Opacus / TF-Federated / TenSEAL / PySyft
|
|
---
|
|
|
|
# Privacy Preserving AI
|
|
|
|
## 매 한 줄
|
|
> **"매 train and infer on data without exposing it — 4 pillars: DP, FL, HE, MPC"**. GDPR (2018) 와 healthcare/finance regulation 으로 driven, 2024 EU AI Act 와 US executive orders 로 mainstream. 2026 currently confidential computing (TEE: Intel TDX, NVIDIA H100 CC, Apple PCC) 가 production deployment 의 default.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 4 pillars
|
|
1. **Differential Privacy (DP)**: noise 추가 to bound info leakage. Calibrated by epsilon (ε).
|
|
2. **Federated Learning (FL)**: model goes to data, not data to model.
|
|
3. **Homomorphic Encryption (HE)**: compute on ciphertext directly.
|
|
4. **Secure Multi-Party Computation (MPC)**: parties jointly compute without revealing inputs.
|
|
|
|
### 매 production additions (2024-2026)
|
|
- **TEE / Confidential computing**: Intel TDX, AMD SEV-SNP, NVIDIA H100 confidential GPU, Apple Private Cloud Compute.
|
|
- **Synthetic data**: GAN/diffusion-generated; near-zero re-id risk if done right.
|
|
- **Machine unlearning**: GDPR right-to-be-forgotten compliance.
|
|
|
|
### 매 trade-offs
|
|
| Method | Privacy | Utility | Compute | Deployed? |
|
|
|---|---|---|---|---|
|
|
| DP-SGD (ε≈1) | High | -2 to -5% acc | 2-5x | Yes (Apple, Google) |
|
|
| Federated | Medium | ~same | High comm | Yes (Gboard, healthcare) |
|
|
| HE (CKKS) | Very high | exact | 1000-10000x | Niche |
|
|
| MPC | Very high | exact | 100-1000x | Niche |
|
|
| TEE | High (HW trust) | ~same | ~1.1x | Rapidly growing |
|
|
|
|
## 💻 패턴
|
|
|
|
### DP-SGD with Opacus (PyTorch)
|
|
```python
|
|
from opacus import PrivacyEngine
|
|
import torch.optim as optim
|
|
|
|
model, optimizer = build_model(), optim.SGD(model.parameters(), lr=0.1)
|
|
loader = build_loader()
|
|
|
|
privacy_engine = PrivacyEngine()
|
|
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
|
|
module=model, optimizer=optimizer, data_loader=loader,
|
|
target_epsilon=1.0, target_delta=1e-5, epochs=10,
|
|
max_grad_norm=1.0,
|
|
)
|
|
|
|
for epoch in range(10):
|
|
for x, y in loader:
|
|
optimizer.zero_grad()
|
|
loss = criterion(model(x), y)
|
|
loss.backward()
|
|
optimizer.step()
|
|
print(f"ε={privacy_engine.get_epsilon(delta=1e-5):.2f}")
|
|
```
|
|
|
|
### Federated averaging (FedAvg)
|
|
```python
|
|
def fed_avg(global_model, client_updates, client_weights):
|
|
"""Weighted average of client deltas."""
|
|
avg_state = {}
|
|
total = sum(client_weights)
|
|
for k in global_model.state_dict():
|
|
avg_state[k] = sum(
|
|
w / total * upd[k] for upd, w in zip(client_updates, client_weights)
|
|
)
|
|
global_model.load_state_dict(avg_state)
|
|
return global_model
|
|
|
|
# Each round:
|
|
# 1. broadcast global model
|
|
# 2. clients train locally (with DP optionally)
|
|
# 3. clients send model deltas (encrypted)
|
|
# 4. server aggregates via secure aggregation
|
|
```
|
|
|
|
### Homomorphic encryption inference (TenSEAL CKKS)
|
|
```python
|
|
import tenseal as ts
|
|
ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192,
|
|
coeff_mod_bit_sizes=[60, 40, 40, 60])
|
|
ctx.global_scale = 2**40
|
|
ctx.generate_galois_keys()
|
|
|
|
x = ts.ckks_vector(ctx, [0.1, 0.5, -0.3, 0.7])
|
|
W = [[0.2, -0.1, 0.4, 0.05]] # plaintext weights
|
|
b = [0.1]
|
|
# encrypted inference: y = W*x + b
|
|
y_enc = x.matmul(W[0]) + b[0]
|
|
y_plain = y_enc.decrypt()
|
|
```
|
|
|
|
### Secure aggregation (cross-device FL)
|
|
```python
|
|
# Bonawitz et al protocol sketch:
|
|
# 1. Pairwise keys via Diffie-Hellman among N clients.
|
|
# 2. Each client sends update + sum_{j} mask_{ij} - sum_{j} mask_{ji}.
|
|
# 3. Server sums all -> masks cancel -> only aggregate revealed.
|
|
# Tolerates dropouts via Shamir secret sharing of seeds.
|
|
```
|
|
|
|
### Confidential GPU inference (NVIDIA H100 CC)
|
|
```bash
|
|
# Boot CC mode
|
|
nvidia-smi conf-compute -srs 1
|
|
# Verify attestation
|
|
nvidia-smi conf-compute -gar
|
|
# Application gets encrypted GPU-CPU bus + attested code
|
|
```
|
|
|
|
### Machine unlearning (SISA)
|
|
```python
|
|
# Sharded, Isolated, Sliced, Aggregated:
|
|
# 1. Shard data into K disjoint parts; train K models.
|
|
# 2. Aggregate (vote/avg) for inference.
|
|
# 3. To unlearn user u: retrain only the shard containing u.
|
|
# Cost: O(1/K) of full retrain.
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Single org, sensitive labels | DP-SGD |
|
|
| Many phones / hospitals | Federated + secure agg + DP |
|
|
| Cloud inference, untrusted server | TEE (H100 CC) or HE |
|
|
| Two parties, joint model | MPC (CrypTen, MP-SPDZ) |
|
|
| GDPR right-to-be-forgotten | SISA / approximate unlearning |
|
|
| Need to share data externally | DP synthetic data |
|
|
|
|
**기본값**: TEE (confidential computing) for inference; DP-SGD + federated for training across orgs.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Privacy]] · [[Practical-Cryptography|Cryptography]] · [[Machine-Learning]]
|
|
- 변형: [[Differential-Privacy]] · [[Federated-Learning]] · [[Homomorphic-Encryption]] · [[Secure-Multi-Party-Computation]]
|
|
- 응용: [[On-Device-ML]]
|
|
- Adjacent: [[Synthetic-Data]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: regulated data (HIPAA, GDPR, PCI), cross-org training, on-device personalization, untrusted-cloud inference.
|
|
**언제 X**: public data, no privacy requirement — overhead not worth it.
|
|
|
|
## ❌ 안티패턴
|
|
- **Big epsilon (ε>10)**: 매 effectively no privacy.
|
|
- **Federated without DP or secure agg**: gradients leak training data.
|
|
- **HE for entire training**: 1000x slowdown — only feasible for inference of small models.
|
|
- **Anonymization theater**: removing names is not privacy (re-id attacks trivial).
|
|
- **Trust me bro confidential**: deploy without remote attestation.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Apple PCC 2024, Google FL papers, NIST DP guidance, NVIDIA H100 CC docs 2024).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — 4 pillars + TEE / unlearning 2026 update |
|