2nd/10_Wiki/Topics/AI_and_ML/Privacy-Preserving-AI.md

---
id: wiki-2026-0508-privacy-preserving-ai
title: Privacy Preserving AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Privacy-Preserving Machine Learning, PPML, Confidential AI]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [privacy, security, differential-privacy, federated-learning, cryptography]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python
  framework: Opacus / TF-Federated / TenSEAL / PySyft
---

# Privacy Preserving AI

## 매 한 줄
> **"매 train and infer on data without exposing it — 4 pillars: DP, FL, HE, MPC"**. GDPR (2018) 와 healthcare/finance regulation 으로 driven, 2024 EU AI Act 와 US executive orders 로 mainstream. 2026 currently confidential computing (TEE: Intel TDX, NVIDIA H100 CC, Apple PCC) 가 production deployment 의 default.

## 매 핵심

### 매 4 pillars
1. **Differential Privacy (DP)**: noise 추가 to bound info leakage. Calibrated by epsilon (ε).
2. **Federated Learning (FL)**: model goes to data, not data to model.
3. **Homomorphic Encryption (HE)**: compute on ciphertext directly.
4. **Secure Multi-Party Computation (MPC)**: parties jointly compute without revealing inputs.

### 매 production additions (2024-2026)
- **TEE / Confidential computing**: Intel TDX, AMD SEV-SNP, NVIDIA H100 confidential GPU, Apple Private Cloud Compute.
- **Synthetic data**: GAN/diffusion-generated; near-zero re-id risk if done right.
- **Machine unlearning**: GDPR right-to-be-forgotten compliance.

### 매 trade-offs
| Method | Privacy | Utility | Compute | Deployed? |
|---|---|---|---|---|
| DP-SGD (ε≈1) | High | -2 to -5% acc | 2-5x | Yes (Apple, Google) |
| Federated | Medium | ~same | High comm | Yes (Gboard, healthcare) |
| HE (CKKS) | Very high | exact | 1000-10000x | Niche |
| MPC | Very high | exact | 100-1000x | Niche |
| TEE | High (HW trust) | ~same | ~1.1x | Rapidly growing |

## 💻 패턴

### DP-SGD with Opacus (PyTorch)
```python
from opacus import PrivacyEngine
import torch.optim as optim

model, optimizer = build_model(), optim.SGD(model.parameters(), lr=0.1)
loader = build_loader()

privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model, optimizer=optimizer, data_loader=loader,
    target_epsilon=1.0, target_delta=1e-5, epochs=10,
    max_grad_norm=1.0,
)

for epoch in range(10):
    for x, y in loader:
        optimizer.zero_grad()
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step()
print(f"ε={privacy_engine.get_epsilon(delta=1e-5):.2f}")
```

### Federated averaging (FedAvg)
```python
def fed_avg(global_model, client_updates, client_weights):
    """Weighted average of client deltas."""
    avg_state = {}
    total = sum(client_weights)
    for k in global_model.state_dict():
        avg_state[k] = sum(
            w / total * upd[k] for upd, w in zip(client_updates, client_weights)
        )
    global_model.load_state_dict(avg_state)
    return global_model

# Each round:
# 1. broadcast global model
# 2. clients train locally (with DP optionally)
# 3. clients send model deltas (encrypted)
# 4. server aggregates via secure aggregation
```

### Homomorphic encryption inference (TenSEAL CKKS)
```python
import tenseal as ts
ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192,
                 coeff_mod_bit_sizes=[60, 40, 40, 60])
ctx.global_scale = 2**40
ctx.generate_galois_keys()

x = ts.ckks_vector(ctx, [0.1, 0.5, -0.3, 0.7])
W = [[0.2, -0.1, 0.4, 0.05]]  # plaintext weights
b = [0.1]
# encrypted inference: y = W*x + b
y_enc = x.matmul(W[0]) + b[0]
y_plain = y_enc.decrypt()
```

### Secure aggregation (cross-device FL)
```python
# Bonawitz et al protocol sketch:
# 1. Pairwise keys via Diffie-Hellman among N clients.
# 2. Each client sends update + sum_{j} mask_{ij} - sum_{j} mask_{ji}.
# 3. Server sums all -> masks cancel -> only aggregate revealed.
# Tolerates dropouts via Shamir secret sharing of seeds.
```

### Confidential GPU inference (NVIDIA H100 CC)
```bash
# Boot CC mode
nvidia-smi conf-compute -srs 1
# Verify attestation
nvidia-smi conf-compute -gar
# Application gets encrypted GPU-CPU bus + attested code
```

### Machine unlearning (SISA)
```python
# Sharded, Isolated, Sliced, Aggregated:
# 1. Shard data into K disjoint parts; train K models.
# 2. Aggregate (vote/avg) for inference.
# 3. To unlearn user u: retrain only the shard containing u.
# Cost: O(1/K) of full retrain.
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Single org, sensitive labels | DP-SGD |
| Many phones / hospitals | Federated + secure agg + DP |
| Cloud inference, untrusted server | TEE (H100 CC) or HE |
| Two parties, joint model | MPC (CrypTen, MP-SPDZ) |
| GDPR right-to-be-forgotten | SISA / approximate unlearning |
| Need to share data externally | DP synthetic data |

**기본값**: TEE (confidential computing) for inference; DP-SGD + federated for training across orgs.

## 🔗 Graph
- 부모: [[Privacy]] · [[Practical-Cryptography|Cryptography]] · [[Machine-Learning]]
- 변형: [[Differential-Privacy]] · [[Federated-Learning]] · [[Homomorphic-Encryption]] · [[Secure-Multi-Party-Computation]]
- 응용: [[On-Device-ML]]
- Adjacent: [[Synthetic-Data]]

## 🤖 LLM 활용
**언제**: regulated data (HIPAA, GDPR, PCI), cross-org training, on-device personalization, untrusted-cloud inference.
**언제 X**: public data, no privacy requirement — overhead not worth it.

## ❌ 안티패턴
- **Big epsilon (ε>10)**: 매 effectively no privacy.
- **Federated without DP or secure agg**: gradients leak training data.
- **HE for entire training**: 1000x slowdown — only feasible for inference of small models.
- **Anonymization theater**: removing names is not privacy (re-id attacks trivial).
- **Trust me bro confidential**: deploy without remote attestation.

## 🧪 검증 / 중복
- Verified (Apple PCC 2024, Google FL papers, NIST DP guidance, NVIDIA H100 CC docs 2024).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — 4 pillars + TEE / unlearning 2026 update |