[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,66 +1,446 @@
 ---
 id: wiki-2026-0508-ai-data-sovereignty
-title: "AI & Data Sovereignty"
+title: AI & Data Sovereignty
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-AIDS-001]
+aliases: [데이터 주권, data sovereignty, AI sovereignty, sovereign cloud, data colonialism, data localization]
 duplicate_of: none
-source_trust_level: A
-confidence_score: 0.93
-tags: [auto-reinforced, data-sovereignty, ai-ethics, privacy, digital-colonialism, data-governance]
+source_trust_level: B
+confidence_score: 0.85
+verification_status: conceptual
+tags: [data-sovereignty, ai-policy, privacy, gdpr, data-localization, federated-learning, sovereign-cloud, geopolitics]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-09
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
+tech_stack:
+  language: policy / engineering
+  applicable_to: [Compliance, Architecture, Government, Privacy]
 ---

-# [[AI & Data Sovereignty|AI & Data Sovereignty]]
+# AI & Data Sovereignty

 ## 📌 한 줄 통찰 (The Karpathy Summary)
-> "데이터의 주인은 누구인가: 우리의 모든 행동이 AI 학습의 공짜 재료가 되는 시대, 개인과 국가가 자신의 데이터를 통제하고 그로부터 창출된 부를 정당하게 나눠 가질 권리를 지키기 위한 투쟁."
+> **"매 data 의 owner 는 누구?"**. Individual / Org / National 의 3 layer. Big Tech AI 의 training data 의 hidden cost. **Federated learning + differential privacy + sovereign cloud** 의 modern technical answer.

 ## 📖 구조화된 지식 (Synthesized Content)
-AI 및 데이터 주권(AI & Data Sovereignty)은 디지털 정보와 그로부터 파생된 AI 모델에 대해 개개인, 조직, 혹은 국가가 가지는 배타적인 통제권과 자기 결정권을 의미합니다.

-1.  **핵심 층위**:
-    *   **Individual Sovereignty**: 내 데이터가 어디에 쓰이는지 알고 거부하거나 보상받을 권리 (Privacy rights).
-    *   **National Sovereignty**: 자국민의 데이터가 해외 거대 테크 기업(Big Tech)의 AI 학습에 종속되지 않도록 인프라와 규제를 갖추는 것.
-    *   **Model Sovereignty**: 특정 국가나 기업의 AI 모델에 의존하지 않고 독자적인 연산력과 모델 아키텍처를 보유하는 능력.
-2.  **부각되는 배경**:
-    *   거대 모델 학습을 위한 무분별한 데이터 수집이 '디지털 식민주의'를 초래할 수 있다는 우려 확산.
+### 3 layer 의 sovereignty
+
+#### 1. Individual sovereignty
+- 매 user 의 own data.
+- Right to know (어떤 data 의 어떤 use).
+- Right to delete (GDPR).
+- Right to object (Article 21).
+- Right to portability.
+- 매 AI training data 의 opt-in / opt-out.
+
+#### 2. Organizational sovereignty
+- 매 company 의 customer data.
+- 매 IP / trade secret.
+- 매 vendor 의 DPA (Data Processing Agreement).
+- 매 sub-processor 의 list.
+- 매 cloud provider 의 dependency.
+
+#### 3. National sovereignty
+- 매 citizen data 의 location.
+- 매 geopolitical risk (foreign govt access).
+- 매 strategic AI capability.
+- 매 industrial policy.
+
+### Major regulation
+| Regulation | Region | Key |
+|---|---|---|
+| **GDPR** | EU | Individual rights + extraterritorial |
+| **CCPA / CPRA** | California | Sale opt-out, sensitive data |
+| **PIPL** | China | Strict cross-border transfer |
+| **DPDPA** | India | 2023+ |
+| **PIPEDA** | Canada | Federal privacy |
+| **POPIA** | South Africa |  |
+| **LGPD** | Brazil | GDPR-similar |
+| **Korea PIPA** | Korea | Modeled on GDPR |
+
+→ 매 country 가 different 의 fragmentation.
+
+### Cross-border transfer 의 challenge
+- **Schrems II** (EU 2020): US-EU Privacy Shield invalid → 매 transfer 의 SCC + assessment.
+- **EU-US Data Privacy Framework** (2023): replacement.
+- **China data export**: strict (CSL, DSL, PIPL).
+- **Russia data localization** (2014+).
+
+### Data colonialism critique
+- 매 Big Tech (US) 의 global data collection.
+- 매 Global South 의 data extractivism.
+- 매 local context 의 underrepresented.
+- 매 AI 의 Western perspective bias.
+
+→ Couldry & Mejias 의 academic concept.
+
+### Sovereign cloud
+- 매 country / region 의 own infra.
+- Examples:
+  - **GAIA-X** (EU): federated cloud.
+  - **Bleu** (France): MS Azure 의 French sovereign.
+  - **S3NS** (France): Google Cloud sovereign.
+  - **Confidential Computing** (Azure / GCP): hardware-isolated.
+  - **AWS Sovereign Cloud** (EU 2024+).
+
+→ 매 vendor 의 "sovereign" claim 의 verification 어려움.
+
+### Sovereign AI capability
+- 매 country 의 own LLM.
+- Examples:
+  - **France**: Mistral AI.
+  - **Falcon** (UAE).
+  - **Kosmos** (Korean LG AI Research).
+  - **HyperCLOVA X** (Naver).
+  - **Yi** / **Qwen** (China).
+  - **NTT 의 tsuzumi** (Japan).
+- Compute (GPU export control).
+- 매 data (자국 corpus).
+- 매 talent.
+
+→ AI sovereignty 의 strategic priority.
+
+### Privacy-preserving AI
+
+#### Federated Learning
+- 매 device / hospital 의 own data.
+- 매 model update 의 share.
+- Central server 의 aggregate.
+
+```python
+# Conceptual
+import flwr as fl
+
+class Client(fl.client.NumPyClient):
+    def fit(self, params, config):
+        model.set_weights(params)
+        model.fit(local_data)
+        return model.get_weights(), len(local_data), {}
+
+# 매 hospital / phone 의 own data + collective learning.
+```
+
+#### Differential Privacy
+- 매 query 의 noise 추가.
+- 매 individual 의 contribution 의 privacy 보장.
+
+```python
+# Apple's iOS, Google's Chrome.
+import numpy as np
+
+def dp_mean(data, epsilon=1.0):
+    sensitivity = (max(data) - min(data)) / len(data)
+    noise = np.random.laplace(0, sensitivity / epsilon)
+    return np.mean(data) + noise
+
+# Aggregate stats with privacy guarantee.
+```
+
+#### Homomorphic encryption
+- 매 encrypted data 의 compute.
+- 결과 도 encrypted.
+- Decrypt 후 result.
+- Computational cost ↑.
+
+#### Secure Multi-Party Computation (MPC)
+- 매 party 의 own data + collective compute.
+- Cryptographic.
+
+#### Confidential computing
+- Hardware enclave (Intel SGX, AMD SEV-SNP, AWS Nitro).
+- 매 cloud 의 compute 의 protect.
+- 매 government / sovereign 의 critical.
+
+### 매 industry challenge
+
+#### Healthcare
+- 매 country 의 health data localization.
+- HIPAA (US) + GDPR (EU) + 매 local.
+- 매 multi-national clinical trial 의 어려움.
+
+#### Finance
+- 매 transaction data 의 cross-border.
+- 매 country 의 banking regulation.
+
+#### Government / defense
+- 매 classified data 의 isolation.
+- 매 supply chain (chips, software).
+- Air-gapped + sovereign.
+
+#### Big Tech enterprise (Salesforce, AWS)
+- 매 customer 의 data location 의 commit.
+- Region selection.
+- 매 EU customer 의 EU-only.
+
+### 매 AI training data 의 issue
+
+#### Copyright lawsuit (2023+)
+- NYT vs OpenAI: training 의 paywalled article.
+- Getty vs Stable Diffusion: image 의 watermark.
+- 매 author / artist 의 copyright class action.
+
+#### Opt-out mechanism
+- robots.txt + AI bot identifier.
+- ai.txt proposal.
+- 매 publisher 의 opt-out (NYT, Reddit deal).
+
+#### Right to be forgotten in training data
+- GDPR 의 right to erasure.
+- 매 trained model 의 unlearn 어려움 (active research).
+
+### 매 organizational pattern
+
+#### Data classification
+- Public / Internal / Confidential / Restricted.
+- 매 AI tool 의 access 의 매 level.
+
+#### Data localization
+- 매 customer 의 region 의 storage.
+- 매 service 의 region 의 deploy.
+- Cross-region 의 explicit replication.
+
+#### Privacy by design
+- 매 system 의 default privacy.
+- Minimum data collection.
+- Purpose limitation.
+- Storage minimization.
+
+### Future trend
+- 매 country 의 AI sovereignty 의 push (chip, data, model).
+- 매 tech bloc (US, EU, China, India) 의 fragmentation.
+- 매 user 의 portable identity (Solid Pods, Web3 식).
+- 매 personal AI (on-device).
+
+## 💻 패턴 (Engineering)
+
+### Region-aware data routing
+```ts
+class DataRouter {
+  determineRegion(user: User): string {
+    if (user.country === 'DE') return 'eu-central';
+    if (user.country in EU_COUNTRIES) return 'eu-west';
+    if (user.country === 'CN') return 'cn-north';
+    if (user.country === 'IN') return 'ap-south';
+    return 'us-east';
+  }
+  
+  async store(data: any, user: User) {
+    const region = this.determineRegion(user);
+    const client = this.getClientFor(region);
+    await client.put(data);
+  }
+}
+```
+
+### Differential privacy (Apple-style)
+```python
+def collect_with_dp(events, epsilon=1.0):
+    """RAPPOR-style randomized response."""
+    f = 0.5  # response prob
+    p, q = 0.5, 0.5
+    
+    randomized = []
+    for e in events:
+        if random.random() < f:
+            randomized.append(random.choice([0, 1]))   # noise
+        else:
+            randomized.append(e)
+    
+    return randomized
+
+# Apple iOS / Google Chrome 가 사용.
+```
+
+### Federated learning
+```python
+import flwr as fl
+
+# Server
+def server_strategy():
+    return fl.server.strategy.FedAvg(
+        fraction_fit=0.5,
+        min_available_clients=10,
+    )
+
+fl.server.start_server(server_address='[::]:8080', strategy=server_strategy())
+
+# Client (per hospital)
+class HospitalClient(fl.client.NumPyClient):
+    def fit(self, parameters, config):
+        self.model.set_weights(parameters)
+        self.model.fit(self.local_x, self.local_y, epochs=1)
+        return self.model.get_weights(), len(self.local_x), {}
+    
+    def evaluate(self, parameters, config):
+        loss, acc = self.model.evaluate(self.test_x, self.test_y)
+        return float(loss), len(self.test_x), {'accuracy': acc}
+
+fl.client.start_numpy_client(server_address='central:8080', client=HospitalClient())
+```
+
+### Confidential computing (AWS Nitro)
+```bash
+# Nitro Enclave 의 isolated compute
+nitro-cli build-enclave --docker-uri my-app:latest --output-file my.eif
+nitro-cli run-enclave --eif-path my.eif --memory 2048 --cpu-count 2
+
+# 매 enclave 의 isolated, attestable, host 의 access X.
+```
+
+### Data classification + DLP
+```python
+SENSITIVE_PATTERNS = [
+    (r'\b\d{3}-\d{2}-\d{4}\b', 'SSN'),
+    (r'\b4\d{12,15}\b', 'CreditCard'),
+    (r'(?i)passport[:= ]+\w+', 'Passport'),
+]
+
+def classify(text: str) -> str:
+    for pattern, label in SENSITIVE_PATTERNS:
+        if re.search(pattern, text):
+            return 'restricted'
+    return 'internal'
+
+# 매 prompt 의 매 outgoing 의 check.
+```
+
+### opt-out signaling (ai.txt / robots.txt)
+```txt
+# robots.txt
+User-agent: GPTBot
+Disallow: /
+
+User-agent: Google-Extended
+Disallow: /
+
+User-agent: anthropic-ai
+Disallow: /
+
+User-agent: ClaudeBot
+Disallow: /
+```
+
+→ 매 LLM 의 training 의 opt-out (compliance 의 vendor 의 의지 의존).
+
+### Vendor DPA template (excerpt)
+```markdown
+## Data Processing Addendum
+
+Vendor agrees:
+1. Process Data only per Customer instructions.
+2. NOT use Customer Data for AI training without explicit opt-in.
+3. Maintain ISO 27001 / SOC 2 Type II.
+4. Sub-processors listed at: vendor.com/subprocessors.
+5. Data location: EU (Frankfurt + Dublin).
+6. 30-day notification of new sub-processor.
+7. Customer right to audit (60-day notice).
+8. Data deletion within 30 days of contract end.
+9. Breach notification within 72 hours.
+```
+
+### Region failover (data residency)
+```yaml
+# K8s region affinity
+apiVersion: v1
+kind: Service
+metadata:
+  name: my-app
+  annotations:
+    cloud.google.com/load-balancer-type: 'Internal'
+spec:
+  type: LoadBalancer
+  selector:
+    app: my-app
+    region: eu-west   # EU traffic 의 EU pod 만.
+```
+
+### Audit log (sovereignty compliance)
+```ts
+async function auditDataAccess(user: User, data: any, action: string) {
+  await db.auditLog.insert({
+    userId: user.id,
+    userRegion: user.region,
+    dataLocation: data.region,
+    action,
+    timestamp: new Date(),
+    crossBorder: user.region !== data.region,
+  });
+}
+```
+
+→ 매 cross-border access 의 visible.
+
+## 🤔 의사결정 기준 (Decision Criteria)
+
+| 상황 | 추천 |
+|---|---|
+| EU customer | EU storage + GDPR |
+| China citizen | Data localization (PIPL) |
+| Government | Sovereign cloud |
+| Healthcare cross-country | Federated learning |
+| Aggregate stats | Differential privacy |
+| Cross-org compute | Secure MPC |
+| Hardware-enforced | Confidential computing |
+| AI training | Opt-in / explicit consent |
+
+**기본값**: Privacy by design + region-aware + audit log + opt-in for AI training.

 ## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 인터넷의 '개방성과 공유' 정책이 최우선이었으나, 현대의 AI 패권 경쟁 정책은 데이터가 곧 전략 자산임을 인식하고 '데이터의 폐쇄적 권리 확보 정책'으로 이동함(RL Update).
- **정책 변화(RL Update)**: EU의 GDPR 및 AI Act와 같이, 개인 데이터를 학습에 쓰려면 명시적인 '옵트-인(Opt-in)'을 거치게 하고 위반 시 막대한 과징금을 부과하는 정책이 데이터 주권 보호의 표준이 됨.
+- **Open data vs sovereignty**: 매 open access 의 historical preference vs strategic data 의 control.
+- **Federated learning 의 limit**: 매 model update 의 leak (gradient inversion attack).
+- **Differential privacy 의 utility loss**: 매 epsilon 작 = privacy ↑ + utility ↓.
+- **Sovereign cloud 의 vendor lock-in**: 매 vendor 의 sovereign claim + 매 underlying tech 의 dependency.
+- **Cross-border 의 enforcement 어려움**: 매 country 가 다른 rule.
+- **AI training data 의 lawsuit**: 매 outcome 의 unclear.
+- **개인 vs 국가 sovereignty 의 tension**: 매 government access (China, etc.).

 ## 🔗 지식 연결 (Graph)
- [[Ethics & AI|Ethics & AI]], [[AI Accountability|AI Accountability]], [[Sociology of Knowledge|Sociology of Knowledge]], [[Universal Basic Income (UBI)|Universal Basic Income (UBI)]], Foundational Models
- **Modern Tech/Tools**: Federated Learning (Privacy-preserving AI), Differential Privacy, Sovereign Clouds.
---
+- 부모: [[Privacy]] · [[Data-Governance]] · [[AI-Ethics]]
+- 변형: [[GDPR-Compliance]] · [[Data-Localization]] · [[Sovereign-Cloud]] · [[Sovereign-AI]]
+- 기술: [[Federated-Learning]] · [[Differential-Privacy]] · [[Homomorphic-Encryption]] · [[Confidential-Computing]] · [[Secure-MPC]]
+- 비판: [[Data-Colonialism]] · [[Big-Tech-Power]] · [[Digital-Imperialism]]
+- 응용: [[AI-Governance-Policy]] · [[AI-Accountability]] · [[Privacy-by-Design]]
+- 정책: [[Schrems-II]] · [[EU-AI-Act]] · [[China-PIPL]] · [[GAIA-X]]
+- AI sovereign: [[Mistral-AI]] · [[HyperCLOVA-X]] · [[Yi-Qwen-China]] · [[Falcon-UAE]]

 ## 🤖 LLM 활용 힌트 (How to Use This Knowledge)

 **언제 이 지식을 쓰는가:**
- *(TODO)*
+- 매 multi-region SaaS 의 architecture.
+- 매 AI vendor 의 DPA negotiation.
+- 매 government / regulated industry 의 deployment.
+- 매 cross-border data flow 의 design.
+- 매 privacy-preserving ML 의 implementation.

 **언제 쓰면 안 되는가:**
- *(TODO)*
+- Specific country 의 legal advice (counsel).
+- Crisis 의 immediate response (incident team).
+- 매 small team 의 over-engineering (KISS first).
+
+## ❌ 안티패턴 (Anti-Patterns)
+- **Single region 의 global service**: 매 customer 의 data residency 의 violation.
+- **No DPA**: vendor 의 data 의 free for all.
+- **AI training opt-in 없음**: 매 user 의 trust loss + lawsuit.
+- **Sovereign cloud 의 marketing claim 의 verify X**: false sense of security.
+- **Federated learning 만 + leak protection X**: gradient inversion.
+- **No audit log**: compliance fail.
+- **GDPR 만 + 다른 regulation 무시**: fragmented violation.

 ## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+- **정보 상태:** verified (concept-level).
+- **출처 신뢰도:** B (GDPR text, EU AI Act, IAPP / privacy Bar Association resources, academic data colonialism literature).
+- **검토 이유:** Manual cleanup. Active regulation. 매 6 month review.

 ## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+- **기존 유사 문서:** [[AI-Governance-Policy]] (related), [[Privacy]] (parent), [[AI-Accountability]] (related).
+- **처리 방식:** KEEP (sovereignty 의 specific lens).
+- **처리 이유:** Geopolitical + technical 의 intersection.

 ## 🕓 변경 이력 (Changelog)
-
 | 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
 |------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
+| 2026-05-09 | Manual cleanup — 3 layer + privacy-preserving tech + regulation map + 안티패턴 추가 | UPDATE | B |