Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

6.1 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Privacy Preserving AI

매 한 줄

"매 train and infer on data without exposing it — 4 pillars: DP, FL, HE, MPC". GDPR (2018) 와 healthcare/finance regulation 으로 driven, 2024 EU AI Act 와 US executive orders 로 mainstream. 2026 currently confidential computing (TEE: Intel TDX, NVIDIA H100 CC, Apple PCC) 가 production deployment 의 default.

매 핵심

매 4 pillars

Differential Privacy (DP): noise 추가 to bound info leakage. Calibrated by epsilon (ε).
Federated Learning (FL): model goes to data, not data to model.
Homomorphic Encryption (HE): compute on ciphertext directly.
Secure Multi-Party Computation (MPC): parties jointly compute without revealing inputs.

매 production additions (2024-2026)

TEE / Confidential computing: Intel TDX, AMD SEV-SNP, NVIDIA H100 confidential GPU, Apple Private Cloud Compute.
Synthetic data: GAN/diffusion-generated; near-zero re-id risk if done right.
Machine unlearning: GDPR right-to-be-forgotten compliance.

매 trade-offs

Method	Privacy	Utility	Compute	Deployed?
DP-SGD (ε≈1)	High	-2 to -5% acc	2-5x	Yes (Apple, Google)
Federated	Medium	~same	High comm	Yes (Gboard, healthcare)
HE (CKKS)	Very high	exact	1000-10000x	Niche
MPC	Very high	exact	100-1000x	Niche
TEE	High (HW trust)	~same	~1.1x	Rapidly growing

💻 패턴

DP-SGD with Opacus (PyTorch)

from opacus import PrivacyEngine
import torch.optim as optim

model, optimizer = build_model(), optim.SGD(model.parameters(), lr=0.1)
loader = build_loader()

privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model, optimizer=optimizer, data_loader=loader,
    target_epsilon=1.0, target_delta=1e-5, epochs=10,
    max_grad_norm=1.0,
)

for epoch in range(10):
    for x, y in loader:
        optimizer.zero_grad()
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step()
print(f"ε={privacy_engine.get_epsilon(delta=1e-5):.2f}")

Federated averaging (FedAvg)

def fed_avg(global_model, client_updates, client_weights):
    """Weighted average of client deltas."""
    avg_state = {}
    total = sum(client_weights)
    for k in global_model.state_dict():
        avg_state[k] = sum(
            w / total * upd[k] for upd, w in zip(client_updates, client_weights)
        )
    global_model.load_state_dict(avg_state)
    return global_model

# Each round:
# 1. broadcast global model
# 2. clients train locally (with DP optionally)
# 3. clients send model deltas (encrypted)
# 4. server aggregates via secure aggregation

Homomorphic encryption inference (TenSEAL CKKS)

import tenseal as ts
ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192,
                 coeff_mod_bit_sizes=[60, 40, 40, 60])
ctx.global_scale = 2**40
ctx.generate_galois_keys()

x = ts.ckks_vector(ctx, [0.1, 0.5, -0.3, 0.7])
W = [[0.2, -0.1, 0.4, 0.05]]  # plaintext weights
b = [0.1]
# encrypted inference: y = W*x + b
y_enc = x.matmul(W[0]) + b[0]
y_plain = y_enc.decrypt()

Secure aggregation (cross-device FL)

# Bonawitz et al protocol sketch:
# 1. Pairwise keys via Diffie-Hellman among N clients.
# 2. Each client sends update + sum_{j} mask_{ij} - sum_{j} mask_{ji}.
# 3. Server sums all -> masks cancel -> only aggregate revealed.
# Tolerates dropouts via Shamir secret sharing of seeds.

Confidential GPU inference (NVIDIA H100 CC)

# Boot CC mode
nvidia-smi conf-compute -srs 1
# Verify attestation
nvidia-smi conf-compute -gar
# Application gets encrypted GPU-CPU bus + attested code

Machine unlearning (SISA)

# Sharded, Isolated, Sliced, Aggregated:
# 1. Shard data into K disjoint parts; train K models.
# 2. Aggregate (vote/avg) for inference.
# 3. To unlearn user u: retrain only the shard containing u.
# Cost: O(1/K) of full retrain.

매 결정 기준

상황	Approach
Single org, sensitive labels	DP-SGD
Many phones / hospitals	Federated + secure agg + DP
Cloud inference, untrusted server	TEE (H100 CC) or HE
Two parties, joint model	MPC (CrypTen, MP-SPDZ)
GDPR right-to-be-forgotten	SISA / approximate unlearning
Need to share data externally	DP synthetic data

기본값: TEE (confidential computing) for inference; DP-SGD + federated for training across orgs.

🔗 Graph

부모: Privacy · Practical-Cryptography · Machine-Learning
변형: Differential-Privacy · Federated-Learning · Homomorphic-Encryption · Secure-Multi-Party-Computation
응용: On-Device-ML
Adjacent: Synthetic-Data

🤖 LLM 활용

언제: regulated data (HIPAA, GDPR, PCI), cross-org training, on-device personalization, untrusted-cloud inference. 언제 X: public data, no privacy requirement — overhead not worth it.

❌ 안티패턴

Big epsilon (ε>10): 매 effectively no privacy.
Federated without DP or secure agg: gradients leak training data.
HE for entire training: 1000x slowdown — only feasible for inference of small models.
Anonymization theater: removing names is not privacy (re-id attacks trivial).
Trust me bro confidential: deploy without remote attestation.

🧪 검증 / 중복

Verified (Apple PCC 2024, Google FL papers, NIST DP guidance, NVIDIA H100 CC docs 2024).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — 4 pillars + TEE / unlearning 2026 update

6.1 KiB Raw Blame History