---
id: wiki-2026-0508-generative-ai
title: Generative AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [generative AI, gen-AI, generative model, LLM, image gen, video gen, audio gen, multimodal]
duplicate_of: none
source_trust_level: A
confidence_score: 0.98
verification_status: applied
tags: [generative-ai, ai, llm, diffusion, multimodal, foundation-model]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python
  framework: Anthropic / OpenAI / Stability / HuggingFace
---

# Generative AI

## 매 한 줄
> **"매 새로운 content 의 의 의 의 model — 매 text, image, audio, video, code, 3D"**. 매 modern: Claude, GPT, Gemini, Llama (text), Midjourney/DALL-E/SD (image), Suno (audio), Sora/Veo (video). 매 transformer + diffusion 의 dominant.

## 매 핵심

### 매 modality
- **Text**: GPT, Claude, Gemini, Llama.
- **Image**: Stable Diffusion, Midjourney, DALL-E 3, FLUX, Imagen 3.
- **Video**: Sora (OpenAI), Veo (Google), Runway, Pika.
- **Audio / Music**: Suno, Udio, MusicLM.
- **Speech**: ElevenLabs, OpenAI TTS, Whisper (STT).
- **3D**: Meshy, Tripo, Luma Genie.
- **Code**: Codex, CodeLlama, Claude.

### 매 architecture
- **Transformer** (text, code).
- **Diffusion** (image, video, audio).
- **Latent diffusion** (SD).
- **DiT** (Diffusion Transformer): SD3, Sora, FLUX.
- **Mamba / SSM** (emerging).

### 매 modern (2025-2026)
- **Frontier**: Claude Opus 4.7, GPT-5, Gemini 2 Ultra.
- **Open**: Llama 3.x, Qwen 2.5, FLUX.
- **Multimodal**: Sora, Veo 2, Genie 2.
- **Reasoning**: o1, o3, R1.

### 매 응용
1. **Productivity**: writing, coding.
2. **Creative**: art, music, video.
3. **Customer service**: chatbot.
4. **Education**: tutor.
5. **Marketing**: ad copy, image.
6. **Research**: literature review.
7. **Game**: NPC, content.

### 매 risk
- **Hallucination**.
- **Copyright** (training, output).
- **Misinformation** (deepfake).
- **Bias**.
- **Energy use**.
- **Job displacement**.

## 💻 패턴

### Text generation (Claude)
```python
from anthropic import Anthropic
client = Anthropic()
r = client.messages.create(model='claude-opus-4-7', max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Write a haiku about AI'}])
```

### Image (Stable Diffusion)
```python
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained('stabilityai/sdxl-turbo', torch_dtype=torch.float16).to('cuda')
img = pipe('a sunset over mountains', num_inference_steps=4).images[0]
```

### FLUX (modern)
```python
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-schnell', torch_dtype=torch.bfloat16).to('cuda')
img = pipe('photorealistic forest', num_inference_steps=4).images[0]
```

### Video (Sora-like)
```python
# 매 OpenAI Sora API (when available)
client.videos.generate(model='sora-1', prompt='a cat playing piano', duration_s=10)
```

### Audio (Suno-like)
```python
# 매 commercial APIs
suno_client.generate_song(prompt='upbeat synth-pop', duration_s=180)
```

### TTS (ElevenLabs)
```python
import elevenlabs
audio = elevenlabs.generate(text='Hello world', voice='Adam', model='eleven_multilingual_v2')
```

### 3D (Tripo / Meshy)
```python
# 매 image → 3D model
mesh = tripo_client.image_to_mesh('input.png')
mesh.save('output.glb')
```

### Multimodal (Claude vision)
```python
client.messages.create(model='claude-opus-4-7', max_tokens=1024,
    messages=[{'role': 'user', 'content': [
        {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
        {'type': 'text', 'text': 'Describe this image'},
    ]}])
```

### Agent (multi-step)
```python
def agent_loop(goal, tools, max_steps=10):
    history = [{'role': 'user', 'content': goal}]
    for _ in range(max_steps):
        r = client.messages.create(model='claude-opus-4-7', tools=tools, messages=history)
        if r.stop_reason == 'end_turn': return r
        # 매 execute tool, append result
```

### Watermark (C2PA)
```python
from c2pa import Signer
Signer(cert).sign('output.png', claims={
    'generator': 'AI', 'model': 'flux-1-schnell',
})
```

### Prompt engineering
```python
def well_formed_prompt(task, context, examples=[], format='json'):
    return f"""## Context
{context}

## Examples
{format_examples(examples)}

## Task
{task}

## Output format
{format}"""
```

### RAG-augmented gen
```python
def rag_generate(question, retriever, llm):
    docs = retriever.retrieve(question, k=5)
    context = '\n'.join(d.text for d in docs)
    return llm.generate(f"Context:\n{context}\n\nQuestion: {question}\nAnswer with citations:")
```

### Fine-tune (LoRA)
```python
from peft import LoraConfig, get_peft_model
config = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'])
model = get_peft_model(base_model, config)
# 매 train on task data
```

### Generation cost monitoring
```python
def cost_track(usage):
    pricing = {'claude-opus-4-7': {'in': 15/1e6, 'out': 75/1e6}}
    cost = usage.input_tokens * pricing[model]['in'] + usage.output_tokens * pricing[model]['out']
    return cost
```

### Eval (LLM-as-judge)
```python
def llm_judge(output, criteria):
    prompt = f'Rate {criteria}. Response: {output}. Output JSON: score 0-10.'
    return json.loads(judge.generate(prompt))['score']
```

### Brand safety
```python
def brand_safe(output):
    return classify_toxicity(output) < 0.05 and not has_competitor(output) and has_brand_voice(output)
```

## 매 결정 기준
| 상황 | Tool |
|---|---|
| Best text quality | Claude Opus 4.7 / GPT-5 |
| Cost-aware text | Claude Sonnet / GPT-4o-mini |
| Best image | FLUX / Midjourney v7 |
| Fast image | SDXL Turbo / FLUX schnell |
| Video | Sora / Veo 2 |
| Audio | Suno / ElevenLabs |
| Local | Llama 3.x + SDXL local |
| Code | Claude / Codex |

**기본값**: 매 frontier API + 매 RAG + 매 prompt eng + 매 LLM-judge eval + 매 brand safety + 매 cost track.

## 🔗 Graph
- 부모: [[AI]] · [[Foundation-Models]]
- 변형: [[Transformer_Architecture_and_LLM_Foundations|LLM]] · [[Diffusion-Models]] · [[Multimodal-LLM]]
- 응용: [[Generative-Adversarial-Networks]] · [[Stable-Diffusion]]
- Adjacent: [[RAG]] · [[Fine-tuning]] · [[Prompt_Engineering|Prompt-Engineering]] · [[Ethics & AI]]

## 🤖 LLM 활용
**언제**: 매 모든 productivity, creative, customer-facing.
**언제 X**: 매 deterministic compute. 매 IP-strict (with care).

## ❌ 안티패턴
- **Hallucination 의 ship**: 매 verify.
- **No watermark**: 매 misinformation.
- **No copyright check**: 매 legal risk.
- **Single model lock-in**: 매 API down → outage.
- **No cost monitoring**: 매 bill shock.

## 🧪 검증 / 중복
- Verified (Anthropic, OpenAI, Stability, Google docs).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — modalities + 매 text / image / video / audio / 3D / agent code |