Files
2nd/10_Wiki/Topics/AI_and_ML/Privacy-Preserving-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.1 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-privacy-preserving-ai Privacy Preserving AI 10_Wiki/Topics verified self
Privacy-Preserving Machine Learning
PPML
Confidential AI
none A 0.9 applied
privacy
security
differential-privacy
federated-learning
cryptography
2026-05-10 pending
language framework
Python Opacus / TF-Federated / TenSEAL / PySyft

Privacy Preserving AI

매 한 줄

"매 train and infer on data without exposing it — 4 pillars: DP, FL, HE, MPC". GDPR (2018) 와 healthcare/finance regulation 으로 driven, 2024 EU AI Act 와 US executive orders 로 mainstream. 2026 currently confidential computing (TEE: Intel TDX, NVIDIA H100 CC, Apple PCC) 가 production deployment 의 default.

매 핵심

매 4 pillars

  1. Differential Privacy (DP): noise 추가 to bound info leakage. Calibrated by epsilon (ε).
  2. Federated Learning (FL): model goes to data, not data to model.
  3. Homomorphic Encryption (HE): compute on ciphertext directly.
  4. Secure Multi-Party Computation (MPC): parties jointly compute without revealing inputs.

매 production additions (2024-2026)

  • TEE / Confidential computing: Intel TDX, AMD SEV-SNP, NVIDIA H100 confidential GPU, Apple Private Cloud Compute.
  • Synthetic data: GAN/diffusion-generated; near-zero re-id risk if done right.
  • Machine unlearning: GDPR right-to-be-forgotten compliance.

매 trade-offs

Method Privacy Utility Compute Deployed?
DP-SGD (ε≈1) High -2 to -5% acc 2-5x Yes (Apple, Google)
Federated Medium ~same High comm Yes (Gboard, healthcare)
HE (CKKS) Very high exact 1000-10000x Niche
MPC Very high exact 100-1000x Niche
TEE High (HW trust) ~same ~1.1x Rapidly growing

💻 패턴

DP-SGD with Opacus (PyTorch)

from opacus import PrivacyEngine
import torch.optim as optim

model, optimizer = build_model(), optim.SGD(model.parameters(), lr=0.1)
loader = build_loader()

privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model, optimizer=optimizer, data_loader=loader,
    target_epsilon=1.0, target_delta=1e-5, epochs=10,
    max_grad_norm=1.0,
)

for epoch in range(10):
    for x, y in loader:
        optimizer.zero_grad()
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step()
print(f"ε={privacy_engine.get_epsilon(delta=1e-5):.2f}")

Federated averaging (FedAvg)

def fed_avg(global_model, client_updates, client_weights):
    """Weighted average of client deltas."""
    avg_state = {}
    total = sum(client_weights)
    for k in global_model.state_dict():
        avg_state[k] = sum(
            w / total * upd[k] for upd, w in zip(client_updates, client_weights)
        )
    global_model.load_state_dict(avg_state)
    return global_model

# Each round:
# 1. broadcast global model
# 2. clients train locally (with DP optionally)
# 3. clients send model deltas (encrypted)
# 4. server aggregates via secure aggregation

Homomorphic encryption inference (TenSEAL CKKS)

import tenseal as ts
ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192,
                 coeff_mod_bit_sizes=[60, 40, 40, 60])
ctx.global_scale = 2**40
ctx.generate_galois_keys()

x = ts.ckks_vector(ctx, [0.1, 0.5, -0.3, 0.7])
W = [[0.2, -0.1, 0.4, 0.05]]  # plaintext weights
b = [0.1]
# encrypted inference: y = W*x + b
y_enc = x.matmul(W[0]) + b[0]
y_plain = y_enc.decrypt()

Secure aggregation (cross-device FL)

# Bonawitz et al protocol sketch:
# 1. Pairwise keys via Diffie-Hellman among N clients.
# 2. Each client sends update + sum_{j} mask_{ij} - sum_{j} mask_{ji}.
# 3. Server sums all -> masks cancel -> only aggregate revealed.
# Tolerates dropouts via Shamir secret sharing of seeds.

Confidential GPU inference (NVIDIA H100 CC)

# Boot CC mode
nvidia-smi conf-compute -srs 1
# Verify attestation
nvidia-smi conf-compute -gar
# Application gets encrypted GPU-CPU bus + attested code

Machine unlearning (SISA)

# Sharded, Isolated, Sliced, Aggregated:
# 1. Shard data into K disjoint parts; train K models.
# 2. Aggregate (vote/avg) for inference.
# 3. To unlearn user u: retrain only the shard containing u.
# Cost: O(1/K) of full retrain.

매 결정 기준

상황 Approach
Single org, sensitive labels DP-SGD
Many phones / hospitals Federated + secure agg + DP
Cloud inference, untrusted server TEE (H100 CC) or HE
Two parties, joint model MPC (CrypTen, MP-SPDZ)
GDPR right-to-be-forgotten SISA / approximate unlearning
Need to share data externally DP synthetic data

기본값: TEE (confidential computing) for inference; DP-SGD + federated for training across orgs.

🔗 Graph

🤖 LLM 활용

언제: regulated data (HIPAA, GDPR, PCI), cross-org training, on-device personalization, untrusted-cloud inference. 언제 X: public data, no privacy requirement — overhead not worth it.

안티패턴

  • Big epsilon (ε>10): 매 effectively no privacy.
  • Federated without DP or secure agg: gradients leak training data.
  • HE for entire training: 1000x slowdown — only feasible for inference of small models.
  • Anonymization theater: removing names is not privacy (re-id attacks trivial).
  • Trust me bro confidential: deploy without remote attestation.

🧪 검증 / 중복

  • Verified (Apple PCC 2024, Google FL papers, NIST DP guidance, NVIDIA H100 CC docs 2024).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — 4 pillars + TEE / unlearning 2026 update