--- id: wiki-2026-0508-ai-data-sovereignty title: AI & Data Sovereignty category: 10_Wiki/Topics status: verified canonical_id: self aliases: [데이터 주권, data sovereignty, AI sovereignty, sovereign cloud, data colonialism, data localization] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: conceptual tags: [data-sovereignty, ai-policy, privacy, gdpr, data-localization, federated-learning, sovereign-cloud, geopolitics] raw_sources: [] last_reinforced: 2026-05-09 github_commit: pending inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09) tech_stack: language: policy / engineering applicable_to: [Compliance, Architecture, Government, Privacy] --- # AI & Data Sovereignty ## 📌 한 줄 통찰 (The Karpathy Summary) > **"매 data 의 owner 는 누구?"**. Individual / Org / National 의 3 layer. Big Tech AI 의 training data 의 hidden cost. **Federated learning + differential privacy + sovereign cloud** 의 modern technical answer. ## 📖 구조화된 지식 (Synthesized Content) ### 3 layer 의 sovereignty #### 1. Individual sovereignty - 매 user 의 own data. - Right to know (어떤 data 의 어떤 use). - Right to delete (GDPR). - Right to object (Article 21). - Right to portability. - 매 AI training data 의 opt-in / opt-out. #### 2. Organizational sovereignty - 매 company 의 customer data. - 매 IP / trade secret. - 매 vendor 의 DPA (Data Processing Agreement). - 매 sub-processor 의 list. - 매 cloud provider 의 dependency. #### 3. National sovereignty - 매 citizen data 의 location. - 매 geopolitical risk (foreign govt access). - 매 strategic AI capability. - 매 industrial policy. ### Major regulation | Regulation | Region | Key | |---|---|---| | **GDPR** | EU | Individual rights + extraterritorial | | **CCPA / CPRA** | California | Sale opt-out, sensitive data | | **PIPL** | China | Strict cross-border transfer | | **DPDPA** | India | 2023+ | | **PIPEDA** | Canada | Federal privacy | | **POPIA** | South Africa | | | **LGPD** | Brazil | GDPR-similar | | **Korea PIPA** | Korea | Modeled on GDPR | → 매 country 가 different 의 fragmentation. ### Cross-border transfer 의 challenge - **Schrems II** (EU 2020): US-EU Privacy Shield invalid → 매 transfer 의 SCC + assessment. - **EU-US Data Privacy Framework** (2023): replacement. - **China data export**: strict (CSL, DSL, PIPL). - **Russia data localization** (2014+). ### Data colonialism critique - 매 Big Tech (US) 의 global data collection. - 매 Global South 의 data extractivism. - 매 local context 의 underrepresented. - 매 AI 의 Western perspective bias. → Couldry & Mejias 의 academic concept. ### Sovereign cloud - 매 country / region 의 own infra. - Examples: - **GAIA-X** (EU): federated cloud. - **Bleu** (France): MS Azure 의 French sovereign. - **S3NS** (France): Google Cloud sovereign. - **Confidential Computing** (Azure / GCP): hardware-isolated. - **AWS Sovereign Cloud** (EU 2024+). → 매 vendor 의 "sovereign" claim 의 verification 어려움. ### Sovereign AI capability - 매 country 의 own LLM. - Examples: - **France**: Mistral AI. - **Falcon** (UAE). - **Kosmos** (Korean LG AI Research). - **HyperCLOVA X** (Naver). - **Yi** / **Qwen** (China). - **NTT 의 tsuzumi** (Japan). - Compute (GPU export control). - 매 data (자국 corpus). - 매 talent. → AI sovereignty 의 strategic priority. ### Privacy-preserving AI #### Federated Learning - 매 device / hospital 의 own data. - 매 model update 의 share. - Central server 의 aggregate. ```python # Conceptual import flwr as fl class Client(fl.client.NumPyClient): def fit(self, params, config): model.set_weights(params) model.fit(local_data) return model.get_weights(), len(local_data), {} # 매 hospital / phone 의 own data + collective learning. ``` #### Differential Privacy - 매 query 의 noise 추가. - 매 individual 의 contribution 의 privacy 보장. ```python # Apple's iOS, Google's Chrome. import numpy as np def dp_mean(data, epsilon=1.0): sensitivity = (max(data) - min(data)) / len(data) noise = np.random.laplace(0, sensitivity / epsilon) return np.mean(data) + noise # Aggregate stats with privacy guarantee. ``` #### Homomorphic encryption - 매 encrypted data 의 compute. - 결과 도 encrypted. - Decrypt 후 result. - Computational cost ↑. #### Secure Multi-Party Computation (MPC) - 매 party 의 own data + collective compute. - Cryptographic. #### Confidential computing - Hardware enclave (Intel SGX, AMD SEV-SNP, AWS Nitro). - 매 cloud 의 compute 의 protect. - 매 government / sovereign 의 critical. ### 매 industry challenge #### Healthcare - 매 country 의 health data localization. - HIPAA (US) + GDPR (EU) + 매 local. - 매 multi-national clinical trial 의 어려움. #### Finance - 매 transaction data 의 cross-border. - 매 country 의 banking regulation. #### Government / defense - 매 classified data 의 isolation. - 매 supply chain (chips, software). - Air-gapped + sovereign. #### Big Tech enterprise (Salesforce, AWS) - 매 customer 의 data location 의 commit. - Region selection. - 매 EU customer 의 EU-only. ### 매 AI training data 의 issue #### Copyright lawsuit (2023+) - NYT vs OpenAI: training 의 paywalled article. - Getty vs Stable Diffusion: image 의 watermark. - 매 author / artist 의 copyright class action. #### Opt-out mechanism - robots.txt + AI bot identifier. - ai.txt proposal. - 매 publisher 의 opt-out (NYT, Reddit deal). #### Right to be forgotten in training data - GDPR 의 right to erasure. - 매 trained model 의 unlearn 어려움 (active research). ### 매 organizational pattern #### Data classification - Public / Internal / Confidential / Restricted. - 매 AI tool 의 access 의 매 level. #### Data localization - 매 customer 의 region 의 storage. - 매 service 의 region 의 deploy. - Cross-region 의 explicit replication. #### Privacy by design - 매 system 의 default privacy. - Minimum data collection. - Purpose limitation. - Storage minimization. ### Future trend - 매 country 의 AI sovereignty 의 push (chip, data, model). - 매 tech bloc (US, EU, China, India) 의 fragmentation. - 매 user 의 portable identity (Solid Pods, Web3 식). - 매 personal AI (on-device). ## 💻 패턴 (Engineering) ### Region-aware data routing ```ts class DataRouter { determineRegion(user: User): string { if (user.country === 'DE') return 'eu-central'; if (user.country in EU_COUNTRIES) return 'eu-west'; if (user.country === 'CN') return 'cn-north'; if (user.country === 'IN') return 'ap-south'; return 'us-east'; } async store(data: any, user: User) { const region = this.determineRegion(user); const client = this.getClientFor(region); await client.put(data); } } ``` ### Differential privacy (Apple-style) ```python def collect_with_dp(events, epsilon=1.0): """RAPPOR-style randomized response.""" f = 0.5 # response prob p, q = 0.5, 0.5 randomized = [] for e in events: if random.random() < f: randomized.append(random.choice([0, 1])) # noise else: randomized.append(e) return randomized # Apple iOS / Google Chrome 가 사용. ``` ### Federated learning ```python import flwr as fl # Server def server_strategy(): return fl.server.strategy.FedAvg( fraction_fit=0.5, min_available_clients=10, ) fl.server.start_server(server_address='[::]:8080', strategy=server_strategy()) # Client (per hospital) class HospitalClient(fl.client.NumPyClient): def fit(self, parameters, config): self.model.set_weights(parameters) self.model.fit(self.local_x, self.local_y, epochs=1) return self.model.get_weights(), len(self.local_x), {} def evaluate(self, parameters, config): loss, acc = self.model.evaluate(self.test_x, self.test_y) return float(loss), len(self.test_x), {'accuracy': acc} fl.client.start_numpy_client(server_address='central:8080', client=HospitalClient()) ``` ### Confidential computing (AWS Nitro) ```bash # Nitro Enclave 의 isolated compute nitro-cli build-enclave --docker-uri my-app:latest --output-file my.eif nitro-cli run-enclave --eif-path my.eif --memory 2048 --cpu-count 2 # 매 enclave 의 isolated, attestable, host 의 access X. ``` ### Data classification + DLP ```python SENSITIVE_PATTERNS = [ (r'\b\d{3}-\d{2}-\d{4}\b', 'SSN'), (r'\b4\d{12,15}\b', 'CreditCard'), (r'(?i)passport[:= ]+\w+', 'Passport'), ] def classify(text: str) -> str: for pattern, label in SENSITIVE_PATTERNS: if re.search(pattern, text): return 'restricted' return 'internal' # 매 prompt 의 매 outgoing 의 check. ``` ### opt-out signaling (ai.txt / robots.txt) ```txt # robots.txt User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: / User-agent: ClaudeBot Disallow: / ``` → 매 LLM 의 training 의 opt-out (compliance 의 vendor 의 의지 의존). ### Vendor DPA template (excerpt) ```markdown ## Data Processing Addendum Vendor agrees: 1. Process Data only per Customer instructions. 2. NOT use Customer Data for AI training without explicit opt-in. 3. Maintain ISO 27001 / SOC 2 Type II. 4. Sub-processors listed at: vendor.com/subprocessors. 5. Data location: EU (Frankfurt + Dublin). 6. 30-day notification of new sub-processor. 7. Customer right to audit (60-day notice). 8. Data deletion within 30 days of contract end. 9. Breach notification within 72 hours. ``` ### Region failover (data residency) ```yaml # K8s region affinity apiVersion: v1 kind: Service metadata: name: my-app annotations: cloud.google.com/load-balancer-type: 'Internal' spec: type: LoadBalancer selector: app: my-app region: eu-west # EU traffic 의 EU pod 만. ``` ### Audit log (sovereignty compliance) ```ts async function auditDataAccess(user: User, data: any, action: string) { await db.auditLog.insert({ userId: user.id, userRegion: user.region, dataLocation: data.region, action, timestamp: new Date(), crossBorder: user.region !== data.region, }); } ``` → 매 cross-border access 의 visible. ## 🤔 의사결정 기준 (Decision Criteria) | 상황 | 추천 | |---|---| | EU customer | EU storage + GDPR | | China citizen | Data localization (PIPL) | | Government | Sovereign cloud | | Healthcare cross-country | Federated learning | | Aggregate stats | Differential privacy | | Cross-org compute | Secure MPC | | Hardware-enforced | Confidential computing | | AI training | Opt-in / explicit consent | **기본값**: Privacy by design + region-aware + audit log + opt-in for AI training. ## ⚠️ 모순 및 업데이트 (Contradictions & Updates) - **Open data vs sovereignty**: 매 open access 의 historical preference vs strategic data 의 control. - **Federated learning 의 limit**: 매 model update 의 leak (gradient inversion attack). - **Differential privacy 의 utility loss**: 매 epsilon 작 = privacy ↑ + utility ↓. - **Sovereign cloud 의 vendor lock-in**: 매 vendor 의 sovereign claim + 매 underlying tech 의 dependency. - **Cross-border 의 enforcement 어려움**: 매 country 가 다른 rule. - **AI training data 의 lawsuit**: 매 outcome 의 unclear. - **개인 vs 국가 sovereignty 의 tension**: 매 government access (China, etc.). ## 🔗 지식 연결 (Graph) - 부모: [[Privacy]] · [[AI-Ethics]] - 변형: [[GDPR-Compliance]] · [[Data-Localization]] · [[Sovereign-Cloud]] - 기술: [[Federated-Learning]] · [[Differential-Privacy]] · [[Homomorphic-Encryption]] - 비판: [[Data-Colonialism]] - 응용: [[AI 거버넌스 정책(AI Usage Policy)|AI-Governance-Policy]] · [[AI Accountability]] - 정책: [[EU-AI-Act]] ## 🤖 LLM 활용 힌트 (How to Use This Knowledge) **언제 이 지식을 쓰는가:** - 매 multi-region SaaS 의 architecture. - 매 AI vendor 의 DPA negotiation. - 매 government / regulated industry 의 deployment. - 매 cross-border data flow 의 design. - 매 privacy-preserving ML 의 implementation. **언제 쓰면 안 되는가:** - Specific country 의 legal advice (counsel). - Crisis 의 immediate response (incident team). - 매 small team 의 over-engineering (KISS first). ## ❌ 안티패턴 (Anti-Patterns) - **Single region 의 global service**: 매 customer 의 data residency 의 violation. - **No DPA**: vendor 의 data 의 free for all. - **AI training opt-in 없음**: 매 user 의 trust loss + lawsuit. - **Sovereign cloud 의 marketing claim 의 verify X**: false sense of security. - **Federated learning 만 + leak protection X**: gradient inversion. - **No audit log**: compliance fail. - **GDPR 만 + 다른 regulation 무시**: fragmented violation. ## 🧪 검증 상태 (Validation) - **정보 상태:** verified (concept-level). - **출처 신뢰도:** B (GDPR text, EU AI Act, IAPP / privacy Bar Association resources, academic data colonialism literature). - **검토 이유:** Manual cleanup. Active regulation. 매 6 month review. ## 🧬 중복 검사 (Duplicate Check) - **기존 유사 문서:** [[AI 거버넌스 정책(AI Usage Policy)|AI-Governance-Policy]] (related), [[Privacy]] (parent), [[AI Accountability]] (related). - **처리 방식:** KEEP (sovereignty 의 specific lens). - **처리 이유:** Geopolitical + technical 의 intersection. ## 🕓 변경 이력 (Changelog) | 날짜 | 변경 내용 | 처리 방식 | 신뢰도 | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A | | 2026-05-09 | Manual cleanup — 3 layer + privacy-preserving tech + regulation map + 안티패턴 추가 | UPDATE | B |