Files
2nd/10_Wiki/Topics/AI_and_ML/AI & Data Sovereignty.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

14 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit inferred_by tech_stack
wiki-2026-0508-ai-data-sovereignty AI & Data Sovereignty 10_Wiki/Topics verified self
데이터 주권
data sovereignty
AI sovereignty
sovereign cloud
data colonialism
data localization
none B 0.85 conceptual
data-sovereignty
ai-policy
privacy
gdpr
data-localization
federated-learning
sovereign-cloud
geopolitics
2026-05-09 pending Claude Opus 4.7 (manual cleanup 2026-05-09)
language applicable_to
policy / engineering
Compliance
Architecture
Government
Privacy

AI & Data Sovereignty

📌 한 줄 통찰 (The Karpathy Summary)

"매 data 의 owner 는 누구?". Individual / Org / National 의 3 layer. Big Tech AI 의 training data 의 hidden cost. Federated learning + differential privacy + sovereign cloud 의 modern technical answer.

📖 구조화된 지식 (Synthesized Content)

3 layer 의 sovereignty

1. Individual sovereignty

  • 매 user 의 own data.
  • Right to know (어떤 data 의 어떤 use).
  • Right to delete (GDPR).
  • Right to object (Article 21).
  • Right to portability.
  • 매 AI training data 의 opt-in / opt-out.

2. Organizational sovereignty

  • 매 company 의 customer data.
  • 매 IP / trade secret.
  • 매 vendor 의 DPA (Data Processing Agreement).
  • 매 sub-processor 의 list.
  • 매 cloud provider 의 dependency.

3. National sovereignty

  • 매 citizen data 의 location.
  • 매 geopolitical risk (foreign govt access).
  • 매 strategic AI capability.
  • 매 industrial policy.

Major regulation

Regulation Region Key
GDPR EU Individual rights + extraterritorial
CCPA / CPRA California Sale opt-out, sensitive data
PIPL China Strict cross-border transfer
DPDPA India 2023+
PIPEDA Canada Federal privacy
POPIA South Africa
LGPD Brazil GDPR-similar
Korea PIPA Korea Modeled on GDPR

→ 매 country 가 different 의 fragmentation.

Cross-border transfer 의 challenge

  • Schrems II (EU 2020): US-EU Privacy Shield invalid → 매 transfer 의 SCC + assessment.
  • EU-US Data Privacy Framework (2023): replacement.
  • China data export: strict (CSL, DSL, PIPL).
  • Russia data localization (2014+).

Data colonialism critique

  • 매 Big Tech (US) 의 global data collection.
  • 매 Global South 의 data extractivism.
  • 매 local context 의 underrepresented.
  • 매 AI 의 Western perspective bias.

→ Couldry & Mejias 의 academic concept.

Sovereign cloud

  • 매 country / region 의 own infra.
  • Examples:
    • GAIA-X (EU): federated cloud.
    • Bleu (France): MS Azure 의 French sovereign.
    • S3NS (France): Google Cloud sovereign.
    • Confidential Computing (Azure / GCP): hardware-isolated.
    • AWS Sovereign Cloud (EU 2024+).

→ 매 vendor 의 "sovereign" claim 의 verification 어려움.

Sovereign AI capability

  • 매 country 의 own LLM.
  • Examples:
    • France: Mistral AI.
    • Falcon (UAE).
    • Kosmos (Korean LG AI Research).
    • HyperCLOVA X (Naver).
    • Yi / Qwen (China).
    • NTT 의 tsuzumi (Japan).
  • Compute (GPU export control).
  • 매 data (자국 corpus).
  • 매 talent.

→ AI sovereignty 의 strategic priority.

Privacy-preserving AI

Federated Learning

  • 매 device / hospital 의 own data.
  • 매 model update 의 share.
  • Central server 의 aggregate.
# Conceptual
import flwr as fl

class Client(fl.client.NumPyClient):
    def fit(self, params, config):
        model.set_weights(params)
        model.fit(local_data)
        return model.get_weights(), len(local_data), {}

# 매 hospital / phone 의 own data + collective learning.

Differential Privacy

  • 매 query 의 noise 추가.
  • 매 individual 의 contribution 의 privacy 보장.
# Apple's iOS, Google's Chrome.
import numpy as np

def dp_mean(data, epsilon=1.0):
    sensitivity = (max(data) - min(data)) / len(data)
    noise = np.random.laplace(0, sensitivity / epsilon)
    return np.mean(data) + noise

# Aggregate stats with privacy guarantee.

Homomorphic encryption

  • 매 encrypted data 의 compute.
  • 결과 도 encrypted.
  • Decrypt 후 result.
  • Computational cost ↑.

Secure Multi-Party Computation (MPC)

  • 매 party 의 own data + collective compute.
  • Cryptographic.

Confidential computing

  • Hardware enclave (Intel SGX, AMD SEV-SNP, AWS Nitro).
  • 매 cloud 의 compute 의 protect.
  • 매 government / sovereign 의 critical.

매 industry challenge

Healthcare

  • 매 country 의 health data localization.
  • HIPAA (US) + GDPR (EU) + 매 local.
  • 매 multi-national clinical trial 의 어려움.

Finance

  • 매 transaction data 의 cross-border.
  • 매 country 의 banking regulation.

Government / defense

  • 매 classified data 의 isolation.
  • 매 supply chain (chips, software).
  • Air-gapped + sovereign.

Big Tech enterprise (Salesforce, AWS)

  • 매 customer 의 data location 의 commit.
  • Region selection.
  • 매 EU customer 의 EU-only.

매 AI training data 의 issue

  • NYT vs OpenAI: training 의 paywalled article.
  • Getty vs Stable Diffusion: image 의 watermark.
  • 매 author / artist 의 copyright class action.

Opt-out mechanism

  • robots.txt + AI bot identifier.
  • ai.txt proposal.
  • 매 publisher 의 opt-out (NYT, Reddit deal).

Right to be forgotten in training data

  • GDPR 의 right to erasure.
  • 매 trained model 의 unlearn 어려움 (active research).

매 organizational pattern

Data classification

  • Public / Internal / Confidential / Restricted.
  • 매 AI tool 의 access 의 매 level.

Data localization

  • 매 customer 의 region 의 storage.
  • 매 service 의 region 의 deploy.
  • Cross-region 의 explicit replication.

Privacy by design

  • 매 system 의 default privacy.
  • Minimum data collection.
  • Purpose limitation.
  • Storage minimization.

Future trend

  • 매 country 의 AI sovereignty 의 push (chip, data, model).
  • 매 tech bloc (US, EU, China, India) 의 fragmentation.
  • 매 user 의 portable identity (Solid Pods, Web3 식).
  • 매 personal AI (on-device).

💻 패턴 (Engineering)

Region-aware data routing

class DataRouter {
  determineRegion(user: User): string {
    if (user.country === 'DE') return 'eu-central';
    if (user.country in EU_COUNTRIES) return 'eu-west';
    if (user.country === 'CN') return 'cn-north';
    if (user.country === 'IN') return 'ap-south';
    return 'us-east';
  }
  
  async store(data: any, user: User) {
    const region = this.determineRegion(user);
    const client = this.getClientFor(region);
    await client.put(data);
  }
}

Differential privacy (Apple-style)

def collect_with_dp(events, epsilon=1.0):
    """RAPPOR-style randomized response."""
    f = 0.5  # response prob
    p, q = 0.5, 0.5
    
    randomized = []
    for e in events:
        if random.random() < f:
            randomized.append(random.choice([0, 1]))   # noise
        else:
            randomized.append(e)
    
    return randomized

# Apple iOS / Google Chrome 가 사용.

Federated learning

import flwr as fl

# Server
def server_strategy():
    return fl.server.strategy.FedAvg(
        fraction_fit=0.5,
        min_available_clients=10,
    )

fl.server.start_server(server_address='[::]:8080', strategy=server_strategy())

# Client (per hospital)
class HospitalClient(fl.client.NumPyClient):
    def fit(self, parameters, config):
        self.model.set_weights(parameters)
        self.model.fit(self.local_x, self.local_y, epochs=1)
        return self.model.get_weights(), len(self.local_x), {}
    
    def evaluate(self, parameters, config):
        loss, acc = self.model.evaluate(self.test_x, self.test_y)
        return float(loss), len(self.test_x), {'accuracy': acc}

fl.client.start_numpy_client(server_address='central:8080', client=HospitalClient())

Confidential computing (AWS Nitro)

# Nitro Enclave 의 isolated compute
nitro-cli build-enclave --docker-uri my-app:latest --output-file my.eif
nitro-cli run-enclave --eif-path my.eif --memory 2048 --cpu-count 2

# 매 enclave 의 isolated, attestable, host 의 access X.

Data classification + DLP

SENSITIVE_PATTERNS = [
    (r'\b\d{3}-\d{2}-\d{4}\b', 'SSN'),
    (r'\b4\d{12,15}\b', 'CreditCard'),
    (r'(?i)passport[:= ]+\w+', 'Passport'),
]

def classify(text: str) -> str:
    for pattern, label in SENSITIVE_PATTERNS:
        if re.search(pattern, text):
            return 'restricted'
    return 'internal'

# 매 prompt 의 매 outgoing 의 check.

opt-out signaling (ai.txt / robots.txt)

# robots.txt
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

→ 매 LLM 의 training 의 opt-out (compliance 의 vendor 의 의지 의존).

Vendor DPA template (excerpt)

## Data Processing Addendum

Vendor agrees:
1. Process Data only per Customer instructions.
2. NOT use Customer Data for AI training without explicit opt-in.
3. Maintain ISO 27001 / SOC 2 Type II.
4. Sub-processors listed at: vendor.com/subprocessors.
5. Data location: EU (Frankfurt + Dublin).
6. 30-day notification of new sub-processor.
7. Customer right to audit (60-day notice).
8. Data deletion within 30 days of contract end.
9. Breach notification within 72 hours.

Region failover (data residency)

# K8s region affinity
apiVersion: v1
kind: Service
metadata:
  name: my-app
  annotations:
    cloud.google.com/load-balancer-type: 'Internal'
spec:
  type: LoadBalancer
  selector:
    app: my-app
    region: eu-west   # EU traffic 의 EU pod 만.

Audit log (sovereignty compliance)

async function auditDataAccess(user: User, data: any, action: string) {
  await db.auditLog.insert({
    userId: user.id,
    userRegion: user.region,
    dataLocation: data.region,
    action,
    timestamp: new Date(),
    crossBorder: user.region !== data.region,
  });
}

→ 매 cross-border access 의 visible.

🤔 의사결정 기준 (Decision Criteria)

상황 추천
EU customer EU storage + GDPR
China citizen Data localization (PIPL)
Government Sovereign cloud
Healthcare cross-country Federated learning
Aggregate stats Differential privacy
Cross-org compute Secure MPC
Hardware-enforced Confidential computing
AI training Opt-in / explicit consent

기본값: Privacy by design + region-aware + audit log + opt-in for AI training.

⚠️ 모순 및 업데이트 (Contradictions & Updates)

  • Open data vs sovereignty: 매 open access 의 historical preference vs strategic data 의 control.
  • Federated learning 의 limit: 매 model update 의 leak (gradient inversion attack).
  • Differential privacy 의 utility loss: 매 epsilon 작 = privacy ↑ + utility ↓.
  • Sovereign cloud 의 vendor lock-in: 매 vendor 의 sovereign claim + 매 underlying tech 의 dependency.
  • Cross-border 의 enforcement 어려움: 매 country 가 다른 rule.
  • AI training data 의 lawsuit: 매 outcome 의 unclear.
  • 개인 vs 국가 sovereignty 의 tension: 매 government access (China, etc.).

🔗 지식 연결 (Graph)

🤖 LLM 활용 힌트 (How to Use This Knowledge)

언제 이 지식을 쓰는가:

  • 매 multi-region SaaS 의 architecture.
  • 매 AI vendor 의 DPA negotiation.
  • 매 government / regulated industry 의 deployment.
  • 매 cross-border data flow 의 design.
  • 매 privacy-preserving ML 의 implementation.

언제 쓰면 안 되는가:

  • Specific country 의 legal advice (counsel).
  • Crisis 의 immediate response (incident team).
  • 매 small team 의 over-engineering (KISS first).

안티패턴 (Anti-Patterns)

  • Single region 의 global service: 매 customer 의 data residency 의 violation.
  • No DPA: vendor 의 data 의 free for all.
  • AI training opt-in 없음: 매 user 의 trust loss + lawsuit.
  • Sovereign cloud 의 marketing claim 의 verify X: false sense of security.
  • Federated learning 만 + leak protection X: gradient inversion.
  • No audit log: compliance fail.
  • GDPR 만 + 다른 regulation 무시: fragmented violation.

🧪 검증 상태 (Validation)

  • 정보 상태: verified (concept-level).
  • 출처 신뢰도: B (GDPR text, EU AI Act, IAPP / privacy Bar Association resources, academic data colonialism literature).
  • 검토 이유: Manual cleanup. Active regulation. 매 6 month review.

🧬 중복 검사 (Duplicate Check)

🕓 변경 이력 (Changelog)

날짜 변경 내용 처리 방식 신뢰도
2026-05-08 P-Reinforce Phase 1 정규화 UPDATE A
2026-05-09 Manual cleanup — 3 layer + privacy-preserving tech + regulation map + 안티패턴 추가 UPDATE B