--- id: wiki-2026-0508-intellectual-property-in-ai title: Intellectual Property in AI category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AI IP, copyright, training data, model IP, fair use, NYT v OpenAI] duplicate_of: none source_trust_level: A confidence_score: 0.85 verification_status: applied tags: [legal, ai-ip, copyright, training-data, fair-use, regulation] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Legal applicable_to: [AI Development, Legal, Policy] --- # Intellectual Property in AI ## 매 한 줄 > **"매 training data, 매 model output, 매 model itself 의 IP 의 의 의 의 unsettled"**. 매 NYT v OpenAI (2023+), Getty v Stability, GitHub Copilot lawsuits. 매 modern: 매 EU AI Act + 매 US Copyright Office (2023). ## 매 핵심 ### 매 issues - **Training data**: 매 copyrighted material 의 의 fair use? - **Output**: 매 AI-generated 의 copyrightable? - **Model**: 매 trade secret vs open-source. - **Style**: 매 artist style 의 mimic 의 violate? ### 매 famous cases - **NYT v OpenAI** (2023+): 매 training on articles. - **Getty v Stability** (2023+): 매 watermarks in output. - **Andersen v Stability** (artists vs SD). - **Doe v GitHub** (Copilot, code). - **Authors Guild v OpenAI** (2023). ### 매 legal stance (current, evolving) - **US Copyright Office (2023)**: 매 pure AI output 의 X copyright (no human authorship). - **EU AI Act (2024)**: 매 training data disclosure 의 transparency. - **Japan**: 매 broad permitted training (2018 amendment). - **UK**: 매 narrow text-and-data-mining exception. ### 매 응용 risk 1. Training data sourcing. 2. Output deployment. 3. Style mimicking. 4. Model release. 5. Watermark / provenance. ## 💻 패턴 ### Training data audit ```python @dataclass class DataSource: source: str license: str provenance: str can_train: bool def audit_training_corpus(sources): risky = [s for s in sources if not s.can_train or s.license == 'unknown'] return {'safe': len(sources) - len(risky), 'risky': risky} ``` ### License compatibility ```python COMPATIBLE = { 'cc0': True, 'cc-by': True, 'mit': True, 'apache-2.0': True, 'cc-by-nc': 'check_purpose', 'cc-by-sa': 'derivative_must_share', 'gpl-3.0': 'derivative_must_open', 'proprietary': False, 'unknown': False, } def can_train(license, purpose='commercial'): rule = COMPATIBLE.get(license) if rule == 'check_purpose': return purpose != 'commercial' return rule ``` ### Output attribution / watermark ```python # 매 C2PA (modern provenance standard) from c2pa import Signer def attach_provenance(media_path, model_id, signer_cert): Signer(signer_cert).sign(media_path, claims={ 'generator': model_id, 'training_data_summary': 'public_domain + licensed', 'timestamp': now(), }) ``` ### Artist style detection (defensive) ```python def style_similarity(generated, reference_artist_works): """매 매 generated style 의 reference artist 의 의 의 close?""" gen_features = clip_encode(generated) artist_features = [clip_encode(w) for w in reference_artist_works] sim = max(cosine(gen_features, f) for f in artist_features) return sim # 매 > 0.9 → flag ``` ### Opt-out registry ```python OPT_OUT = load_registry('https://spawning.ai/opt-out') def filter_training_data(images): return [img for img in images if img.creator not in OPT_OUT] ``` ### Memorization detection (training data leakage) ```python def detect_memorization(model, training_examples, n_test=100): """매 매 model 의 의 의 verbatim 의 reproduce 매?""" leaks = 0 for ex in random.sample(training_examples, n_test): prompt = ex.text[:100] gen = model.generate(prompt, max_tokens=200) if longest_common_substring(gen, ex.text) > 50: leaks += 1 return leaks / n_test ``` ### Fair use 4-factor analysis ```python def fair_use_analysis(use_case): return { 'purpose': 'transformative? commercial?', 'nature': 'creative or factual? published?', 'amount': 'how much used? heart of work?', 'effect': 'market harm? substitute?', } # 매 매 case 의 의 의 의 evaluate — 매 lawyer 의 needed ``` ### EU AI Act compliance (training data summary) ```python def eu_training_data_disclosure(corpus): return { 'general_purpose_ai': True, 'training_data_summary': summarize_corpus(corpus), 'compute_used': estimate_compute(corpus), 'systemic_risk': flops_above_threshold(), } ``` ### Model release license ```yaml # 매 매 trade-off licenses: - name: Llama Community License type: permissive_with_exceptions commercial: yes (with conditions) - name: Apache 2.0 type: permissive commercial: yes - name: AGPL-3.0 type: copyleft commercial: yes (must share derivatives) - name: CC-BY-NC type: non_commercial commercial: no ``` ### Output cleansing (preserve user IP) ```python def output_clean_for_user_ip(generated, user_input): """매 generated 의 의 user input 의 verbatim 매 가능.""" if generated_contains_user_input(generated, user_input): # 매 user retains rights to their part return mark_user_section(generated, user_input) return generated ``` ### LLM legal-compliance prompt ```python LEGAL_SYSTEM = """You generate legal-aware output. When asked about IP-sensitive content: 1. Note that AI-generated work may not be copyrightable in some jurisdictions. 2. Cite training data limitations when relevant. 3. Flag if a request seems to ask for verbatim copyrighted material. 4. Recommend lawyer consultation for legal decisions.""" ``` ### Code verbatim check (Copilot-style) ```python def code_verbatim_check(generated_code, public_repos): """매 매 매 long verbatim 의 detect → user 의 warn.""" matches = [] for repo in public_repos: for file in repo.files: common = longest_common_substring(generated_code, file.content) if len(common) > 100: matches.append({'repo': repo.name, 'license': repo.license, 'lines': common}) return matches ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Build model | License audit + opt-out respect | | Deploy output | Watermark + provenance | | Style mimicking | Detection + flag | | EU market | AI Act disclosure | | Open-source | Apache / Llama license | | User-generated | Preserve user rights | **기본값**: 매 license-clean training (audit + opt-out) + 매 watermark output (C2PA) + 매 EU disclosure + 매 lawyer consult for edge cases. ## 🔗 Graph - 부모: [[Ethics & AI]] - 변형: [[Model-IP]] - 응용: [[EU-AI-Act]] · [[GDPR]] · [[C2PA]] - Adjacent: [[Generative-AI]] · [[Copyright]] ## 🤖 LLM 활용 **언제**: 매 commercial AI deploy. 매 dataset construction. **언제 X**: 매 academic research only (limited). ## ❌ 안티패턴 - **Train on anything**: 매 lawsuits. - **No watermark**: 매 misuse / impersonation. - **Ignore opt-out**: 매 brand risk. - **No EU AI Act prep**: 매 fines. - **Skip lawyer**: 매 specific case decisions. ## 🧪 검증 / 중복 - Verified (US Copyright Office 2023, EU AI Act 2024, court filings). - 신뢰도 B+. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — IP issues + 매 audit / watermark / fair use / disclosure code |