--- id: wiki-2026-0508-ensuring-data-privacy title: Ensuring Data Privacy category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Data Privacy, Privacy Engineering, GDPR Compliance] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [privacy, gdpr, security, compliance] raw_sources: [] last_reinforced: 2026-05-10 github_commit: applied tech_stack: language: Python/TypeScript framework: OneTrust/Fides/OPA --- # Ensuring Data Privacy ## 매 한 줄 > **"매 personal data 가 lawful basis + minimum + purpose-limited 로 다뤄진다."**. Data privacy engineering 은 매 GDPR/CCPA/LGPD/K-PIPA 의 legal requirement 를 매 storage, processing, transfer, retention 의 매 단계 에 deterministic control 로 구현. 2026 stack: classification + DLP + tokenization/PETs (DP, FHE, TEE) + consent management + DSAR automation + privacy-by-design. ## 매 핵심 ### 매 Privacy Principle (GDPR Art.5) 1. **Lawfulness, fairness, transparency** — consent / legitimate interest. 2. **Purpose limitation** — 매 collected purpose 외 사용 금지. 3. **Data minimization** — 매 필요한 최소. 4. **Accuracy** — correctable. 5. **Storage limitation** — retention schedule. 6. **Integrity & confidentiality** — encryption. 7. **Accountability** — DPO, audit, DPIA. ### 매 PET (Privacy-Enhancing Tech) 2026 - **Pseudonymization**: tokenization, format-preserving encryption (FPE). - **Anonymization**: k-anonymity, l-diversity, t-closeness. - **Differential Privacy**: ε,δ noise — Apple, US Census, Chrome. - **Federated learning**: 매 model travels, data stays. - **Homomorphic encryption (FHE)**: 매 compute on encrypted — Microsoft SEAL, OpenFHE. - **Confidential computing (TEE)**: Intel TDX, AMD SEV-SNP, Apple Private Cloud Compute. - **Zero-Knowledge Proofs**: identity 증명 without disclose. ### 매 응용 1. EU GDPR + 한국 PIPA + 중국 PIPL compliance. 2. Healthcare HIPAA, PCI-DSS payment. 3. ML training without raw data (FL, DP). 4. Cross-border transfer (SCC, BCR, DPF). 5. Right to be forgotten (RTBF) automation. ## 💻 패턴 ### Data classification + DLP ```python # 매 PII detection — Microsoft Presidio from presidio_analyzer import AnalyzerEngine analyzer = AnalyzerEngine() results = analyzer.analyze(text=user_input, language='en', entities=['EMAIL_ADDRESS','PHONE_NUMBER','CREDIT_CARD','PERSON','KR_RRN']) for r in results: redact_or_mask(text, r.start, r.end) ``` ### Format-preserving tokenization ```python # 매 ff3-1 — preserves format (e.g., card number) from ff3 import FF3Cipher c = FF3Cipher(key, tweak) token = c.encrypt("4242424242424242") # → 16-digit string plain = c.decrypt(token) ``` ### Differential Privacy noise ```python import numpy as np def laplace_mechanism(true_val, sensitivity, epsilon): return true_val + np.random.laplace(0, sensitivity / epsilon) # 매 query: count of users in segment noisy_count = laplace_mechanism(true_count=1234, sensitivity=1, epsilon=1.0) ``` ### k-anonymity check ```python import pandas as pd def k_anonymity(df: pd.DataFrame, quasi_ids: list[str]) -> int: return df.groupby(quasi_ids).size().min() # 매 ensure k>=5 before release assert k_anonymity(df, ['zip','age','gender']) >= 5 ``` ### DSAR (Data Subject Access Request) automation ```python async def dsar_export(user_id: str) -> bytes: bundle = { 'profile': await db.users.find_one({'_id':user_id}), 'orders': [o async for o in db.orders.find({'userId':user_id})], 'logs': await elasticsearch_export(user_id), } return json.dumps(bundle, default=str).encode() async def dsar_erasure(user_id: str): await db.users.update_one({'_id':user_id}, {'$set': {'email':None,'name':None,'erasedAt':datetime.utcnow()}}) await s3.delete_objects(Bucket='pii', Prefix=f'users/{user_id}/') ``` ### Consent record (Fides/IAB TCF) ```typescript const consent = { userId: 'u_123', purposes: { analytics: true, marketing: false, personalization: true }, vendors: { google: true }, timestamp: new Date().toISOString(), version: 'tcf-2.2', signature: hmac(record), }; await db.consents.insertOne(consent); ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | EU users | GDPR + Schrems II SCC | | 한국 users | PIPA — 개인정보처리방침, 위탁 동의 | | Aggregate analytics | Differential Privacy | | Payment data | PCI-DSS tokenization | | ML training | Federated learning + DP | | Cross-org compute | TEE (Confidential Computing) | **기본값**: 매 minimize + classify + tokenize + consent ledger + DSAR API. ## 🔗 Graph - 부모: [[Practical-Cryptography]] · [[보안_및_시스템_신뢰성_표준|Symmetric-Encryption]] - 변형: [[보안_및_시스템_신뢰성_표준|Zero-Trust Architecture]] - 응용: [[Anomaly-Detection]] · [[Information-Society]] - Adjacent: [[Digital Intellectual Property Rights]] ## 🤖 LLM 활용 **언제**: privacy policy 검토, DSAR response draft, PIA 질문 generation. **언제 X**: 매 PII 를 third-party LLM 에 raw 로 전송 — anonymize 먼저. ## ❌ 안티패턴 - **Hash = anonymized 오해**: 매 hash 는 pseudonymization, GDPR 적용. - **Consent on entry-only**: 매 ongoing — withdrawable, granular. - **Log PII**: 매 logger 가 leak source — redact filter. - **Forever retention**: 매 GDPR 위반 — TTL + erasure. - **Plaintext backup**: 매 encryption at rest 필수. ## 🧪 검증 / 중복 - Verified: GDPR Art.5/17/25; ISO/IEC 27701; NIST SP 800-188; Microsoft Presidio docs. - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — principles + PETs + DSAR/DP patterns |