f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.3 KiB
7.3 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-intellectual-property-in-ai | Intellectual Property in AI | 10_Wiki/Topics | verified | self |
|
none | A | 0.85 | applied |
|
2026-05-10 | pending |
|
Intellectual Property in AI
매 한 줄
"매 training data, 매 model output, 매 model itself 의 IP 의 의 의 의 unsettled". 매 NYT v OpenAI (2023+), Getty v Stability, GitHub Copilot lawsuits. 매 modern: 매 EU AI Act + 매 US Copyright Office (2023).
매 핵심
매 issues
- Training data: 매 copyrighted material 의 의 fair use?
- Output: 매 AI-generated 의 copyrightable?
- Model: 매 trade secret vs open-source.
- Style: 매 artist style 의 mimic 의 violate?
매 famous cases
- NYT v OpenAI (2023+): 매 training on articles.
- Getty v Stability (2023+): 매 watermarks in output.
- Andersen v Stability (artists vs SD).
- Doe v GitHub (Copilot, code).
- Authors Guild v OpenAI (2023).
매 legal stance (current, evolving)
- US Copyright Office (2023): 매 pure AI output 의 X copyright (no human authorship).
- EU AI Act (2024): 매 training data disclosure 의 transparency.
- Japan: 매 broad permitted training (2018 amendment).
- UK: 매 narrow text-and-data-mining exception.
매 응용 risk
- Training data sourcing.
- Output deployment.
- Style mimicking.
- Model release.
- Watermark / provenance.
💻 패턴
Training data audit
@dataclass
class DataSource:
source: str
license: str
provenance: str
can_train: bool
def audit_training_corpus(sources):
risky = [s for s in sources if not s.can_train or s.license == 'unknown']
return {'safe': len(sources) - len(risky), 'risky': risky}
License compatibility
COMPATIBLE = {
'cc0': True, 'cc-by': True, 'mit': True, 'apache-2.0': True,
'cc-by-nc': 'check_purpose',
'cc-by-sa': 'derivative_must_share',
'gpl-3.0': 'derivative_must_open',
'proprietary': False, 'unknown': False,
}
def can_train(license, purpose='commercial'):
rule = COMPATIBLE.get(license)
if rule == 'check_purpose': return purpose != 'commercial'
return rule
Output attribution / watermark
# 매 C2PA (modern provenance standard)
from c2pa import Signer
def attach_provenance(media_path, model_id, signer_cert):
Signer(signer_cert).sign(media_path, claims={
'generator': model_id,
'training_data_summary': 'public_domain + licensed',
'timestamp': now(),
})
Artist style detection (defensive)
def style_similarity(generated, reference_artist_works):
"""매 매 generated style 의 reference artist 의 의 의 close?"""
gen_features = clip_encode(generated)
artist_features = [clip_encode(w) for w in reference_artist_works]
sim = max(cosine(gen_features, f) for f in artist_features)
return sim # 매 > 0.9 → flag
Opt-out registry
OPT_OUT = load_registry('https://spawning.ai/opt-out')
def filter_training_data(images):
return [img for img in images if img.creator not in OPT_OUT]
Memorization detection (training data leakage)
def detect_memorization(model, training_examples, n_test=100):
"""매 매 model 의 의 의 verbatim 의 reproduce 매?"""
leaks = 0
for ex in random.sample(training_examples, n_test):
prompt = ex.text[:100]
gen = model.generate(prompt, max_tokens=200)
if longest_common_substring(gen, ex.text) > 50:
leaks += 1
return leaks / n_test
Fair use 4-factor analysis
def fair_use_analysis(use_case):
return {
'purpose': 'transformative? commercial?',
'nature': 'creative or factual? published?',
'amount': 'how much used? heart of work?',
'effect': 'market harm? substitute?',
}
# 매 매 case 의 의 의 의 evaluate — 매 lawyer 의 needed
EU AI Act compliance (training data summary)
def eu_training_data_disclosure(corpus):
return {
'general_purpose_ai': True,
'training_data_summary': summarize_corpus(corpus),
'compute_used': estimate_compute(corpus),
'systemic_risk': flops_above_threshold(),
}
Model release license
# 매 매 trade-off
licenses:
- name: Llama Community License
type: permissive_with_exceptions
commercial: yes (with conditions)
- name: Apache 2.0
type: permissive
commercial: yes
- name: AGPL-3.0
type: copyleft
commercial: yes (must share derivatives)
- name: CC-BY-NC
type: non_commercial
commercial: no
Output cleansing (preserve user IP)
def output_clean_for_user_ip(generated, user_input):
"""매 generated 의 의 user input 의 verbatim 매 가능."""
if generated_contains_user_input(generated, user_input):
# 매 user retains rights to their part
return mark_user_section(generated, user_input)
return generated
LLM legal-compliance prompt
LEGAL_SYSTEM = """You generate legal-aware output.
When asked about IP-sensitive content:
1. Note that AI-generated work may not be copyrightable in some jurisdictions.
2. Cite training data limitations when relevant.
3. Flag if a request seems to ask for verbatim copyrighted material.
4. Recommend lawyer consultation for legal decisions."""
Code verbatim check (Copilot-style)
def code_verbatim_check(generated_code, public_repos):
"""매 매 매 long verbatim 의 detect → user 의 warn."""
matches = []
for repo in public_repos:
for file in repo.files:
common = longest_common_substring(generated_code, file.content)
if len(common) > 100:
matches.append({'repo': repo.name, 'license': repo.license, 'lines': common})
return matches
매 결정 기준
| 상황 | Approach |
|---|---|
| Build model | License audit + opt-out respect |
| Deploy output | Watermark + provenance |
| Style mimicking | Detection + flag |
| EU market | AI Act disclosure |
| Open-source | Apache / Llama license |
| User-generated | Preserve user rights |
기본값: 매 license-clean training (audit + opt-out) + 매 watermark output (C2PA) + 매 EU disclosure + 매 lawyer consult for edge cases.
🔗 Graph
- 부모: Ethics & AI
- 변형: Model-IP
- 응용: EU-AI-Act · GDPR · C2PA
- Adjacent: Generative-AI · Copyright
🤖 LLM 활용
언제: 매 commercial AI deploy. 매 dataset construction. 언제 X: 매 academic research only (limited).
❌ 안티패턴
- Train on anything: 매 lawsuits.
- No watermark: 매 misuse / impersonation.
- Ignore opt-out: 매 brand risk.
- No EU AI Act prep: 매 fines.
- Skip lawyer: 매 specific case decisions.
🧪 검증 / 중복
- Verified (US Copyright Office 2023, EU AI Act 2024, court filings).
- 신뢰도 B+.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — IP issues + 매 audit / watermark / fair use / disclosure code |