Files
2nd/10_Wiki/Topics/DevOps_and_Security/Ensuring-Data-Privacy.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

5.6 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-ensuring-data-privacy Ensuring Data Privacy 10_Wiki/Topics verified self
Data Privacy
Privacy Engineering
GDPR Compliance
none A 0.9 applied
privacy
gdpr
security
compliance
2026-05-10 applied
language framework
Python/TypeScript OneTrust/Fides/OPA

Ensuring Data Privacy

매 한 줄

"매 personal data 가 lawful basis + minimum + purpose-limited 로 다뤄진다.". Data privacy engineering 은 매 GDPR/CCPA/LGPD/K-PIPA 의 legal requirement 를 매 storage, processing, transfer, retention 의 매 단계 에 deterministic control 로 구현. 2026 stack: classification + DLP + tokenization/PETs (DP, FHE, TEE) + consent management + DSAR automation + privacy-by-design.

매 핵심

매 Privacy Principle (GDPR Art.5)

  1. Lawfulness, fairness, transparency — consent / legitimate interest.
  2. Purpose limitation — 매 collected purpose 외 사용 금지.
  3. Data minimization — 매 필요한 최소.
  4. Accuracy — correctable.
  5. Storage limitation — retention schedule.
  6. Integrity & confidentiality — encryption.
  7. Accountability — DPO, audit, DPIA.

매 PET (Privacy-Enhancing Tech) 2026

  • Pseudonymization: tokenization, format-preserving encryption (FPE).
  • Anonymization: k-anonymity, l-diversity, t-closeness.
  • Differential Privacy: ε,δ noise — Apple, US Census, Chrome.
  • Federated learning: 매 model travels, data stays.
  • Homomorphic encryption (FHE): 매 compute on encrypted — Microsoft SEAL, OpenFHE.
  • Confidential computing (TEE): Intel TDX, AMD SEV-SNP, Apple Private Cloud Compute.
  • Zero-Knowledge Proofs: identity 증명 without disclose.

매 응용

  1. EU GDPR + 한국 PIPA + 중국 PIPL compliance.
  2. Healthcare HIPAA, PCI-DSS payment.
  3. ML training without raw data (FL, DP).
  4. Cross-border transfer (SCC, BCR, DPF).
  5. Right to be forgotten (RTBF) automation.

💻 패턴

Data classification + DLP

# 매 PII detection — Microsoft Presidio
from presidio_analyzer import AnalyzerEngine
analyzer = AnalyzerEngine()
results = analyzer.analyze(text=user_input, language='en',
  entities=['EMAIL_ADDRESS','PHONE_NUMBER','CREDIT_CARD','PERSON','KR_RRN'])
for r in results: redact_or_mask(text, r.start, r.end)

Format-preserving tokenization

# 매 ff3-1 — preserves format (e.g., card number)
from ff3 import FF3Cipher
c = FF3Cipher(key, tweak)
token = c.encrypt("4242424242424242")  # → 16-digit string
plain = c.decrypt(token)

Differential Privacy noise

import numpy as np
def laplace_mechanism(true_val, sensitivity, epsilon):
    return true_val + np.random.laplace(0, sensitivity / epsilon)
# 매 query: count of users in segment
noisy_count = laplace_mechanism(true_count=1234, sensitivity=1, epsilon=1.0)

k-anonymity check

import pandas as pd
def k_anonymity(df: pd.DataFrame, quasi_ids: list[str]) -> int:
    return df.groupby(quasi_ids).size().min()
# 매 ensure k>=5 before release
assert k_anonymity(df, ['zip','age','gender']) >= 5

DSAR (Data Subject Access Request) automation

async def dsar_export(user_id: str) -> bytes:
    bundle = {
      'profile': await db.users.find_one({'_id':user_id}),
      'orders':  [o async for o in db.orders.find({'userId':user_id})],
      'logs':    await elasticsearch_export(user_id),
    }
    return json.dumps(bundle, default=str).encode()

async def dsar_erasure(user_id: str):
    await db.users.update_one({'_id':user_id},
        {'$set': {'email':None,'name':None,'erasedAt':datetime.utcnow()}})
    await s3.delete_objects(Bucket='pii', Prefix=f'users/{user_id}/')
const consent = {
  userId: 'u_123',
  purposes: { analytics: true, marketing: false, personalization: true },
  vendors: { google: true },
  timestamp: new Date().toISOString(),
  version: 'tcf-2.2',
  signature: hmac(record),
};
await db.consents.insertOne(consent);

매 결정 기준

상황 Approach
EU users GDPR + Schrems II SCC
한국 users PIPA — 개인정보처리방침, 위탁 동의
Aggregate analytics Differential Privacy
Payment data PCI-DSS tokenization
ML training Federated learning + DP
Cross-org compute TEE (Confidential Computing)

기본값: 매 minimize + classify + tokenize + consent ledger + DSAR API.

🔗 Graph

🤖 LLM 활용

언제: privacy policy 검토, DSAR response draft, PIA 질문 generation. 언제 X: 매 PII 를 third-party LLM 에 raw 로 전송 — anonymize 먼저.

안티패턴

  • Hash = anonymized 오해: 매 hash 는 pseudonymization, GDPR 적용.
  • Consent on entry-only: 매 ongoing — withdrawable, granular.
  • Log PII: 매 logger 가 leak source — redact filter.
  • Forever retention: 매 GDPR 위반 — TTL + erasure.
  • Plaintext backup: 매 encryption at rest 필수.

🧪 검증 / 중복

  • Verified: GDPR Art.5/17/25; ISO/IEC 27701; NIST SP 800-188; Microsoft Presidio docs.
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — principles + PETs + DSAR/DP patterns