d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.8 KiB
9.8 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-data-ethics-privacy | Data Ethics and Privacy | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
Data Ethics and Privacy
매 한 줄
"매 can we의 X — 매 should we". 매 GDPR (EU 2018) + CCPA (CA) + 매 EU AI Act (2024). 매 modern technique: differential privacy, federated learning, homomorphic encryption, ZK proof. 매 LLM 시대 의 training data + memorization 의 new challenge.
매 핵심 principle
매 Fair Information Practice Principles (FIPPs)
- Notice / Transparency.
- Consent / Choice.
- Access / Participation.
- Integrity / Security.
- Enforcement / Redress.
GDPR (EU, 2018)
- Lawful basis: 매 6 (consent, contract, legal, vital, public, legitimate interest).
- Data minimization.
- Purpose limitation.
- Right to access / rectify / erasure / portability / object.
- DPIA (Data Protection Impact Assessment).
- DPO (Data Protection Officer).
- 매 fine: 매 4% global revenue.
CCPA / CPRA (California)
- 매 sell 의 opt-out.
- 매 sensitive data 의 limit.
매 EU AI Act (2024)
- 매 risk tier (unacceptable / high / limited / minimal).
- 매 high-risk: 매 audit, 매 human oversight.
매 privacy techniques
Anonymization (irreversible)
- 매 PII 의 remove + 매 generalize.
- 매 k-anonymity (Sweeney 2002).
- 매 l-diversity, t-closeness.
Pseudonymization (reversible with key)
- 매 PII 의 hash / token.
- 매 GDPR 의 lighter requirement.
Differential Privacy (Dwork 2006)
- 매 noise injection.
- 매 mathematical guarantee (ε).
- 매 Apple, Google, US Census.
Federated Learning
- 매 raw data 의 leave 의 X.
- 매 model update 만.
- 매 mobile, healthcare.
Homomorphic Encryption
- 매 encrypted 의 compute.
- 매 expensive (still).
Secure Multi-party Computation (MPC)
- 매 multiple party 의 compute 의 share X.
Zero-Knowledge Proof
- 매 prove without reveal.
- 매 ZK-ML 의 emerging.
LLM-specific privacy
- Training data leak: 매 verbatim memorization.
- Membership inference: 매 was 매 X 의 train data?
- Model inversion: 매 model 의 input 의 reconstruct.
- Mitigation: dedup, DP-SGD, scrubbing, prompt safety.
매 design pattern
- Privacy by Design (Cavoukian).
- Data minimization.
- Encryption at rest + in transit.
- Access control + audit log.
- Retention policy.
- Right to erasure pipeline.
💻 패턴
PII detection + redaction
import re
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def anonymize(text, language='en'):
results = analyzer.analyze(text=text, language=language)
return anonymizer.anonymize(text=text, analyzer_results=results).text
# 매 example
print(anonymize('John Smith, ssn 123-45-6789, lives at 123 Main St.'))
# <PERSON>, ssn <US_SSN>, lives at <LOCATION>.
k-Anonymity
def k_anonymize(df, quasi_identifiers, k=5):
"""매 매 group 의 size ≥ k."""
grouped = df.groupby(quasi_identifiers).size()
valid_groups = grouped[grouped >= k].index
return df[df.set_index(quasi_identifiers).index.isin(valid_groups)]
# 매 generalize age, zip → 매 group ≥ 5
Differential Privacy (Laplace mechanism)
import numpy as np
def laplace_mechanism(query_result, sensitivity, epsilon):
"""매 ε-differential privacy."""
noise = np.random.laplace(0, sensitivity / epsilon)
return query_result + noise
# 매 example: 매 count of users with X
count_true = df['has_x'].sum()
count_dp = laplace_mechanism(count_true, sensitivity=1, epsilon=1.0)
DP-SGD (training)
from opacus import PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.0,
max_grad_norm=1.0,
)
# 매 training 의 DP guarantee
for x, y in train_loader:
optimizer.zero_grad()
loss = F.cross_entropy(model(x), y)
loss.backward()
optimizer.step()
epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f'(ε={epsilon}, δ={1e-5})-DP')
Federated Learning (Flower)
import flwr as fl
class Client(fl.client.NumPyClient):
def get_parameters(self, config):
return [val.cpu().numpy() for val in model.state_dict().values()]
def fit(self, parameters, config):
# 매 local training (raw data 의 stay local)
set_parameters(model, parameters)
train(model, local_dataset)
return get_parameters(model), len(local_dataset), {}
def evaluate(self, parameters, config):
loss, accuracy = test(model, local_dataset)
return loss, len(local_dataset), {'accuracy': accuracy}
fl.client.start_client(server_address='central:8080', client=Client())
Right to Erasure (GDPR)
def erase_user(user_id):
"""매 GDPR Art. 17."""
# 매 1. primary
db.users.delete(user_id)
# 매 2. derived (cache, analytics, ML training)
cache.invalidate(f'user:{user_id}')
analytics_db.delete_user_events(user_id)
# 매 3. backup (delayed)
schedule_backup_purge(user_id, after_days=30)
# 매 4. ML model — 매 retrain or 매 forget request
if user_in_training_data(user_id):
machine_unlearning(user_id) or schedule_retrain()
# 매 5. log + audit
audit_log.write({'action': 'erase', 'user_id': user_id, 'date': now()})
Access control + audit
def access_pii(user_id, requester, reason):
if not requester.has_role('data_steward'):
raise PermissionError()
audit_log.write({
'requester': requester.id,
'subject': user_id,
'reason': reason,
'timestamp': datetime.now(),
'data_accessed': ['email', 'phone'],
})
return db.users.get(user_id, fields=['email', 'phone'])
LLM verbatim leak detection
def check_training_data_leak(model, training_corpus, n_samples=100):
"""매 model 의 verbatim 의 reproduce?"""
leaks = []
for chunk in random.sample(training_corpus, n_samples):
prefix = chunk[:100]
completion = model.generate(prefix, max_tokens=200)
if chunk[100:300] in completion:
leaks.append({'chunk': chunk[:50] + '...', 'leaked': True})
return leaks
Membership inference attack (defense check)
def membership_inference_test(model, member_data, non_member_data):
"""매 model 의 매 specific 의 in train data 의 distinguish?"""
member_loss = [loss(model, x, y) for x, y in member_data]
non_member_loss = [loss(model, x, y) for x, y in non_member_data]
# 매 simple threshold attack
threshold = np.median(member_loss + non_member_loss)
correct = sum(1 for l in member_loss if l < threshold) + \
sum(1 for l in non_member_loss if l > threshold)
accuracy = correct / (len(member_loss) + len(non_member_loss))
if accuracy > 0.55:
return 'WARN: membership inference vulnerability'
return 'OK'
Data retention policy
def retention_purge():
"""매 매 day 의 cron."""
# 매 GDPR + business 의 retention.
db.events.delete(where=f"created_at < NOW() - INTERVAL '7 years'")
db.user_logs.delete(where=f"created_at < NOW() - INTERVAL '90 days'")
db.analytics_raw.delete(where=f"created_at < NOW() - INTERVAL '13 months'")
매 결정 기준
| 상황 | Approach |
|---|---|
| EU users | GDPR compliance (DPO + DPIA + erasure) |
| US users | CCPA + sector law (HIPAA, FERPA) |
| Sensitive aggregate stats | Differential Privacy |
| Cross-org training | Federated Learning |
| Encrypted compute | Homomorphic / MPC |
| LLM training | DP-SGD + dedup + scrub |
| Identity-needed | Pseudonymize + access control |
| Public data | Anonymize (k-anonymity) |
기본값: Privacy by Design + 매 access control + 매 retention + 매 erasure pipeline.
🔗 Graph
- 부모: AI-Ethics · Privacy
- 변형: GDPR · CCPA · Differential-Privacy · Federated-Learning · Homomorphic-Encryption
- 응용: k-Anonymity
- Adjacent: Algorithmic Fairness · Authenticity · AI-Sovereignty · Data-Flywheel-Effect · Anthropomorphism
🤖 LLM 활용
언제: 매 product privacy review. 매 GDPR audit. 매 ML privacy. 매 cross-border data transfer. 언제 X: 매 specific legal advice (lawyer). 매 medical clinical (HIPAA expert).
❌ 안티패턴
- Anonymization 의 false sense (linkage attack 가능): 매 k-anonymity + 매 l-diversity.
- Long retention without business need: 매 GDPR violation.
- No erasure pipeline: 매 right 의 fulfill X.
- PII in logs: 매 invisible leak.
- Federated 의 raw data 의 leak (model invert): 매 DP-SGD 도 필요.
- Consent 의 invalid (forced, vague).
🧪 검증 / 중복
- Verified (GDPR text, Apple DP, Dwork DP paper, Sweeney k-anonymity).
- 신뢰도 A.
- Related: Algorithmic Fairness · Authenticity · Bias-Correction-Algorithm · Anthropomorphism · Atmospheric-Intelligence (privacy challenges).
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — GDPR + DP + FL + 매 Presidio / k-anon / Opacus / Flower / erasure code |