Files
2nd/10_Wiki/Topics/AI_and_ML/Intellectual-Property-in-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.3 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-intellectual-property-in-ai Intellectual Property in AI 10_Wiki/Topics verified self
AI IP
copyright
training data
model IP
fair use
NYT v OpenAI
none A 0.85 applied
legal
ai-ip
copyright
training-data
fair-use
regulation
2026-05-10 pending
language applicable_to
Legal
AI Development
Legal
Policy

Intellectual Property in AI

매 한 줄

"매 training data, 매 model output, 매 model itself 의 IP 의 의 의 의 unsettled". 매 NYT v OpenAI (2023+), Getty v Stability, GitHub Copilot lawsuits. 매 modern: 매 EU AI Act + 매 US Copyright Office (2023).

매 핵심

매 issues

  • Training data: 매 copyrighted material 의 의 fair use?
  • Output: 매 AI-generated 의 copyrightable?
  • Model: 매 trade secret vs open-source.
  • Style: 매 artist style 의 mimic 의 violate?

매 famous cases

  • NYT v OpenAI (2023+): 매 training on articles.
  • Getty v Stability (2023+): 매 watermarks in output.
  • Andersen v Stability (artists vs SD).
  • Doe v GitHub (Copilot, code).
  • Authors Guild v OpenAI (2023).
  • US Copyright Office (2023): 매 pure AI output 의 X copyright (no human authorship).
  • EU AI Act (2024): 매 training data disclosure 의 transparency.
  • Japan: 매 broad permitted training (2018 amendment).
  • UK: 매 narrow text-and-data-mining exception.

매 응용 risk

  1. Training data sourcing.
  2. Output deployment.
  3. Style mimicking.
  4. Model release.
  5. Watermark / provenance.

💻 패턴

Training data audit

@dataclass
class DataSource:
    source: str
    license: str
    provenance: str
    can_train: bool

def audit_training_corpus(sources):
    risky = [s for s in sources if not s.can_train or s.license == 'unknown']
    return {'safe': len(sources) - len(risky), 'risky': risky}

License compatibility

COMPATIBLE = {
    'cc0': True, 'cc-by': True, 'mit': True, 'apache-2.0': True,
    'cc-by-nc': 'check_purpose',
    'cc-by-sa': 'derivative_must_share',
    'gpl-3.0': 'derivative_must_open',
    'proprietary': False, 'unknown': False,
}

def can_train(license, purpose='commercial'):
    rule = COMPATIBLE.get(license)
    if rule == 'check_purpose': return purpose != 'commercial'
    return rule

Output attribution / watermark

# 매 C2PA (modern provenance standard)
from c2pa import Signer

def attach_provenance(media_path, model_id, signer_cert):
    Signer(signer_cert).sign(media_path, claims={
        'generator': model_id,
        'training_data_summary': 'public_domain + licensed',
        'timestamp': now(),
    })

Artist style detection (defensive)

def style_similarity(generated, reference_artist_works):
    """매 매 generated style 의 reference artist 의 의 의 close?"""
    gen_features = clip_encode(generated)
    artist_features = [clip_encode(w) for w in reference_artist_works]
    sim = max(cosine(gen_features, f) for f in artist_features)
    return sim  # 매 > 0.9 → flag

Opt-out registry

OPT_OUT = load_registry('https://spawning.ai/opt-out')

def filter_training_data(images):
    return [img for img in images if img.creator not in OPT_OUT]

Memorization detection (training data leakage)

def detect_memorization(model, training_examples, n_test=100):
    """매 매 model 의 의 의 verbatim 의 reproduce 매?"""
    leaks = 0
    for ex in random.sample(training_examples, n_test):
        prompt = ex.text[:100]
        gen = model.generate(prompt, max_tokens=200)
        if longest_common_substring(gen, ex.text) > 50:
            leaks += 1
    return leaks / n_test

Fair use 4-factor analysis

def fair_use_analysis(use_case):
    return {
        'purpose': 'transformative? commercial?',
        'nature': 'creative or factual? published?',
        'amount': 'how much used? heart of work?',
        'effect': 'market harm? substitute?',
    }
# 매 매 case 의 의 의 의 evaluate — 매 lawyer 의 needed

EU AI Act compliance (training data summary)

def eu_training_data_disclosure(corpus):
    return {
        'general_purpose_ai': True,
        'training_data_summary': summarize_corpus(corpus),
        'compute_used': estimate_compute(corpus),
        'systemic_risk': flops_above_threshold(),
    }

Model release license

# 매 매 trade-off
licenses:
  - name: Llama Community License
    type: permissive_with_exceptions
    commercial: yes (with conditions)
    
  - name: Apache 2.0
    type: permissive
    commercial: yes
    
  - name: AGPL-3.0
    type: copyleft
    commercial: yes (must share derivatives)
    
  - name: CC-BY-NC
    type: non_commercial
    commercial: no

Output cleansing (preserve user IP)

def output_clean_for_user_ip(generated, user_input):
    """매 generated 의 의 user input 의 verbatim 매 가능."""
    if generated_contains_user_input(generated, user_input):
        # 매 user retains rights to their part
        return mark_user_section(generated, user_input)
    return generated
LEGAL_SYSTEM = """You generate legal-aware output.

When asked about IP-sensitive content:
1. Note that AI-generated work may not be copyrightable in some jurisdictions.
2. Cite training data limitations when relevant.
3. Flag if a request seems to ask for verbatim copyrighted material.
4. Recommend lawyer consultation for legal decisions."""

Code verbatim check (Copilot-style)

def code_verbatim_check(generated_code, public_repos):
    """매 매 매 long verbatim 의 detect → user 의 warn."""
    matches = []
    for repo in public_repos:
        for file in repo.files:
            common = longest_common_substring(generated_code, file.content)
            if len(common) > 100:
                matches.append({'repo': repo.name, 'license': repo.license, 'lines': common})
    return matches

매 결정 기준

상황 Approach
Build model License audit + opt-out respect
Deploy output Watermark + provenance
Style mimicking Detection + flag
EU market AI Act disclosure
Open-source Apache / Llama license
User-generated Preserve user rights

기본값: 매 license-clean training (audit + opt-out) + 매 watermark output (C2PA) + 매 EU disclosure + 매 lawyer consult for edge cases.

🔗 Graph

🤖 LLM 활용

언제: 매 commercial AI deploy. 매 dataset construction. 언제 X: 매 academic research only (limited).

안티패턴

  • Train on anything: 매 lawsuits.
  • No watermark: 매 misuse / impersonation.
  • Ignore opt-out: 매 brand risk.
  • No EU AI Act prep: 매 fines.
  • Skip lawyer: 매 specific case decisions.

🧪 검증 / 중복

  • Verified (US Copyright Office 2023, EU AI Act 2024, court filings).
  • 신뢰도 B+.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — IP issues + 매 audit / watermark / fair use / disclosure code