Files
2nd/10_Wiki/Topics/AI_and_ML/Generative-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

237 lines
6.8 KiB
Markdown

---
id: wiki-2026-0508-generative-ai
title: Generative AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [generative AI, gen-AI, generative model, LLM, image gen, video gen, audio gen, multimodal]
duplicate_of: none
source_trust_level: A
confidence_score: 0.98
verification_status: applied
tags: [generative-ai, ai, llm, diffusion, multimodal, foundation-model]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: Anthropic / OpenAI / Stability / HuggingFace
---
# Generative AI
## 매 한 줄
> **"매 새로운 content 의 의 의 의 model — 매 text, image, audio, video, code, 3D"**. 매 modern: Claude, GPT, Gemini, Llama (text), Midjourney/DALL-E/SD (image), Suno (audio), Sora/Veo (video). 매 transformer + diffusion 의 dominant.
## 매 핵심
### 매 modality
- **Text**: GPT, Claude, Gemini, Llama.
- **Image**: Stable Diffusion, Midjourney, DALL-E 3, FLUX, Imagen 3.
- **Video**: Sora (OpenAI), Veo (Google), Runway, Pika.
- **Audio / Music**: Suno, Udio, MusicLM.
- **Speech**: ElevenLabs, OpenAI TTS, Whisper (STT).
- **3D**: Meshy, Tripo, Luma Genie.
- **Code**: Codex, CodeLlama, Claude.
### 매 architecture
- **Transformer** (text, code).
- **Diffusion** (image, video, audio).
- **Latent diffusion** (SD).
- **DiT** (Diffusion Transformer): SD3, Sora, FLUX.
- **Mamba / SSM** (emerging).
### 매 modern (2025-2026)
- **Frontier**: Claude Opus 4.7, GPT-5, Gemini 2 Ultra.
- **Open**: Llama 3.x, Qwen 2.5, FLUX.
- **Multimodal**: Sora, Veo 2, Genie 2.
- **Reasoning**: o1, o3, R1.
### 매 응용
1. **Productivity**: writing, coding.
2. **Creative**: art, music, video.
3. **Customer service**: chatbot.
4. **Education**: tutor.
5. **Marketing**: ad copy, image.
6. **Research**: literature review.
7. **Game**: NPC, content.
### 매 risk
- **Hallucination**.
- **Copyright** (training, output).
- **Misinformation** (deepfake).
- **Bias**.
- **Energy use**.
- **Job displacement**.
## 💻 패턴
### Text generation (Claude)
```python
from anthropic import Anthropic
client = Anthropic()
r = client.messages.create(model='claude-opus-4-7', max_tokens=1024,
messages=[{'role': 'user', 'content': 'Write a haiku about AI'}])
```
### Image (Stable Diffusion)
```python
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained('stabilityai/sdxl-turbo', torch_dtype=torch.float16).to('cuda')
img = pipe('a sunset over mountains', num_inference_steps=4).images[0]
```
### FLUX (modern)
```python
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-schnell', torch_dtype=torch.bfloat16).to('cuda')
img = pipe('photorealistic forest', num_inference_steps=4).images[0]
```
### Video (Sora-like)
```python
# 매 OpenAI Sora API (when available)
client.videos.generate(model='sora-1', prompt='a cat playing piano', duration_s=10)
```
### Audio (Suno-like)
```python
# 매 commercial APIs
suno_client.generate_song(prompt='upbeat synth-pop', duration_s=180)
```
### TTS (ElevenLabs)
```python
import elevenlabs
audio = elevenlabs.generate(text='Hello world', voice='Adam', model='eleven_multilingual_v2')
```
### 3D (Tripo / Meshy)
```python
# 매 image → 3D model
mesh = tripo_client.image_to_mesh('input.png')
mesh.save('output.glb')
```
### Multimodal (Claude vision)
```python
client.messages.create(model='claude-opus-4-7', max_tokens=1024,
messages=[{'role': 'user', 'content': [
{'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
{'type': 'text', 'text': 'Describe this image'},
]}])
```
### Agent (multi-step)
```python
def agent_loop(goal, tools, max_steps=10):
history = [{'role': 'user', 'content': goal}]
for _ in range(max_steps):
r = client.messages.create(model='claude-opus-4-7', tools=tools, messages=history)
if r.stop_reason == 'end_turn': return r
# 매 execute tool, append result
```
### Watermark (C2PA)
```python
from c2pa import Signer
Signer(cert).sign('output.png', claims={
'generator': 'AI', 'model': 'flux-1-schnell',
})
```
### Prompt engineering
```python
def well_formed_prompt(task, context, examples=[], format='json'):
return f"""## Context
{context}
## Examples
{format_examples(examples)}
## Task
{task}
## Output format
{format}"""
```
### RAG-augmented gen
```python
def rag_generate(question, retriever, llm):
docs = retriever.retrieve(question, k=5)
context = '\n'.join(d.text for d in docs)
return llm.generate(f"Context:\n{context}\n\nQuestion: {question}\nAnswer with citations:")
```
### Fine-tune (LoRA)
```python
from peft import LoraConfig, get_peft_model
config = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'])
model = get_peft_model(base_model, config)
# 매 train on task data
```
### Generation cost monitoring
```python
def cost_track(usage):
pricing = {'claude-opus-4-7': {'in': 15/1e6, 'out': 75/1e6}}
cost = usage.input_tokens * pricing[model]['in'] + usage.output_tokens * pricing[model]['out']
return cost
```
### Eval (LLM-as-judge)
```python
def llm_judge(output, criteria):
prompt = f'Rate {criteria}. Response: {output}. Output JSON: score 0-10.'
return json.loads(judge.generate(prompt))['score']
```
### Brand safety
```python
def brand_safe(output):
return classify_toxicity(output) < 0.05 and not has_competitor(output) and has_brand_voice(output)
```
## 매 결정 기준
| 상황 | Tool |
|---|---|
| Best text quality | Claude Opus 4.7 / GPT-5 |
| Cost-aware text | Claude Sonnet / GPT-4o-mini |
| Best image | FLUX / Midjourney v7 |
| Fast image | SDXL Turbo / FLUX schnell |
| Video | Sora / Veo 2 |
| Audio | Suno / ElevenLabs |
| Local | Llama 3.x + SDXL local |
| Code | Claude / Codex |
**기본값**: 매 frontier API + 매 RAG + 매 prompt eng + 매 LLM-judge eval + 매 brand safety + 매 cost track.
## 🔗 Graph
- 부모: [[AI]] · [[Foundation-Models]]
- 변형: [[Transformer_Architecture_and_LLM_Foundations|LLM]] · [[Diffusion-Models]] · [[Multimodal-LLM]]
- 응용: [[Generative-Adversarial-Networks]] · [[Stable-Diffusion]]
- Adjacent: [[RAG]] · [[Fine-tuning]] · [[Prompt_Engineering|Prompt-Engineering]] · [[Ethics & AI]]
## 🤖 LLM 활용
**언제**: 매 모든 productivity, creative, customer-facing.
**언제 X**: 매 deterministic compute. 매 IP-strict (with care).
## ❌ 안티패턴
- **Hallucination 의 ship**: 매 verify.
- **No watermark**: 매 misinformation.
- **No copyright check**: 매 legal risk.
- **Single model lock-in**: 매 API down → outage.
- **No cost monitoring**: 매 bill shock.
## 🧪 검증 / 중복
- Verified (Anthropic, OpenAI, Stability, Google docs).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — modalities + 매 text / image / video / audio / 3D / agent code |