"매 train and infer on data without exposing it — 4 pillars: DP, FL, HE, MPC". GDPR (2018) 와 healthcare/finance regulation 으로 driven, 2024 EU AI Act 와 US executive orders 로 mainstream. 2026 currently confidential computing (TEE: Intel TDX, NVIDIA H100 CC, Apple PCC) 가 production deployment 의 default.
매 핵심
매 4 pillars
Differential Privacy (DP): noise 추가 to bound info leakage. Calibrated by epsilon (ε).
Federated Learning (FL): model goes to data, not data to model.
Homomorphic Encryption (HE): compute on ciphertext directly.
Secure Multi-Party Computation (MPC): parties jointly compute without revealing inputs.
매 production additions (2024-2026)
TEE / Confidential computing: Intel TDX, AMD SEV-SNP, NVIDIA H100 confidential GPU, Apple Private Cloud Compute.
Synthetic data: GAN/diffusion-generated; near-zero re-id risk if done right.
deffed_avg(global_model,client_updates,client_weights):"""Weighted average of client deltas."""avg_state={}total=sum(client_weights)forkinglobal_model.state_dict():avg_state[k]=sum(w/total*upd[k]forupd,winzip(client_updates,client_weights))global_model.load_state_dict(avg_state)returnglobal_model# Each round:# 1. broadcast global model# 2. clients train locally (with DP optionally)# 3. clients send model deltas (encrypted)# 4. server aggregates via secure aggregation
Homomorphic encryption inference (TenSEAL CKKS)
importtensealastsctx=ts.context(ts.SCHEME_TYPE.CKKS,poly_modulus_degree=8192,coeff_mod_bit_sizes=[60,40,40,60])ctx.global_scale=2**40ctx.generate_galois_keys()x=ts.ckks_vector(ctx,[0.1,0.5,-0.3,0.7])W=[[0.2,-0.1,0.4,0.05]]# plaintext weightsb=[0.1]# encrypted inference: y = W*x + by_enc=x.matmul(W[0])+b[0]y_plain=y_enc.decrypt()
Secure aggregation (cross-device FL)
# Bonawitz et al protocol sketch:# 1. Pairwise keys via Diffie-Hellman among N clients.# 2. Each client sends update + sum_{j} mask_{ij} - sum_{j} mask_{ji}.# 3. Server sums all -> masks cancel -> only aggregate revealed.# Tolerates dropouts via Shamir secret sharing of seeds.
# Sharded, Isolated, Sliced, Aggregated:# 1. Shard data into K disjoint parts; train K models.# 2. Aggregate (vote/avg) for inference.# 3. To unlearn user u: retrain only the shard containing u.# Cost: O(1/K) of full retrain.
매 결정 기준
상황
Approach
Single org, sensitive labels
DP-SGD
Many phones / hospitals
Federated + secure agg + DP
Cloud inference, untrusted server
TEE (H100 CC) or HE
Two parties, joint model
MPC (CrypTen, MP-SPDZ)
GDPR right-to-be-forgotten
SISA / approximate unlearning
Need to share data externally
DP synthetic data
기본값: TEE (confidential computing) for inference; DP-SGD + federated for training across orgs.
언제: regulated data (HIPAA, GDPR, PCI), cross-org training, on-device personalization, untrusted-cloud inference.
언제 X: public data, no privacy requirement — overhead not worth it.
❌ 안티패턴
Big epsilon (ε>10): 매 effectively no privacy.
Federated without DP or secure agg: gradients leak training data.
HE for entire training: 1000x slowdown — only feasible for inference of small models.
Anonymization theater: removing names is not privacy (re-id attacks trivial).
Trust me bro confidential: deploy without remote attestation.
🧪 검증 / 중복
Verified (Apple PCC 2024, Google FL papers, NIST DP guidance, NVIDIA H100 CC docs 2024).