--- id: wiki-2026-0508-generative-ai title: Generative AI category: 10_Wiki/Topics status: verified canonical_id: self aliases: [generative AI, gen-AI, generative model, LLM, image gen, video gen, audio gen, multimodal] duplicate_of: none source_trust_level: A confidence_score: 0.98 verification_status: applied tags: [generative-ai, ai, llm, diffusion, multimodal, foundation-model] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: Anthropic / OpenAI / Stability / HuggingFace --- # Generative AI ## 매 한 줄 > **"매 새로운 content 의 의 의 의 model — 매 text, image, audio, video, code, 3D"**. 매 modern: Claude, GPT, Gemini, Llama (text), Midjourney/DALL-E/SD (image), Suno (audio), Sora/Veo (video). 매 transformer + diffusion 의 dominant. ## 매 핵심 ### 매 modality - **Text**: GPT, Claude, Gemini, Llama. - **Image**: Stable Diffusion, Midjourney, DALL-E 3, FLUX, Imagen 3. - **Video**: Sora (OpenAI), Veo (Google), Runway, Pika. - **Audio / Music**: Suno, Udio, MusicLM. - **Speech**: ElevenLabs, OpenAI TTS, Whisper (STT). - **3D**: Meshy, Tripo, Luma Genie. - **Code**: Codex, CodeLlama, Claude. ### 매 architecture - **Transformer** (text, code). - **Diffusion** (image, video, audio). - **Latent diffusion** (SD). - **DiT** (Diffusion Transformer): SD3, Sora, FLUX. - **Mamba / SSM** (emerging). ### 매 modern (2025-2026) - **Frontier**: Claude Opus 4.7, GPT-5, Gemini 2 Ultra. - **Open**: Llama 3.x, Qwen 2.5, FLUX. - **Multimodal**: Sora, Veo 2, Genie 2. - **Reasoning**: o1, o3, R1. ### 매 응용 1. **Productivity**: writing, coding. 2. **Creative**: art, music, video. 3. **Customer service**: chatbot. 4. **Education**: tutor. 5. **Marketing**: ad copy, image. 6. **Research**: literature review. 7. **Game**: NPC, content. ### 매 risk - **Hallucination**. - **Copyright** (training, output). - **Misinformation** (deepfake). - **Bias**. - **Energy use**. - **Job displacement**. ## 💻 패턴 ### Text generation (Claude) ```python from anthropic import Anthropic client = Anthropic() r = client.messages.create(model='claude-opus-4-7', max_tokens=1024, messages=[{'role': 'user', 'content': 'Write a haiku about AI'}]) ``` ### Image (Stable Diffusion) ```python from diffusers import StableDiffusionXLPipeline pipe = StableDiffusionXLPipeline.from_pretrained('stabilityai/sdxl-turbo', torch_dtype=torch.float16).to('cuda') img = pipe('a sunset over mountains', num_inference_steps=4).images[0] ``` ### FLUX (modern) ```python from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-schnell', torch_dtype=torch.bfloat16).to('cuda') img = pipe('photorealistic forest', num_inference_steps=4).images[0] ``` ### Video (Sora-like) ```python # 매 OpenAI Sora API (when available) client.videos.generate(model='sora-1', prompt='a cat playing piano', duration_s=10) ``` ### Audio (Suno-like) ```python # 매 commercial APIs suno_client.generate_song(prompt='upbeat synth-pop', duration_s=180) ``` ### TTS (ElevenLabs) ```python import elevenlabs audio = elevenlabs.generate(text='Hello world', voice='Adam', model='eleven_multilingual_v2') ``` ### 3D (Tripo / Meshy) ```python # 매 image → 3D model mesh = tripo_client.image_to_mesh('input.png') mesh.save('output.glb') ``` ### Multimodal (Claude vision) ```python client.messages.create(model='claude-opus-4-7', max_tokens=1024, messages=[{'role': 'user', 'content': [ {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}}, {'type': 'text', 'text': 'Describe this image'}, ]}]) ``` ### Agent (multi-step) ```python def agent_loop(goal, tools, max_steps=10): history = [{'role': 'user', 'content': goal}] for _ in range(max_steps): r = client.messages.create(model='claude-opus-4-7', tools=tools, messages=history) if r.stop_reason == 'end_turn': return r # 매 execute tool, append result ``` ### Watermark (C2PA) ```python from c2pa import Signer Signer(cert).sign('output.png', claims={ 'generator': 'AI', 'model': 'flux-1-schnell', }) ``` ### Prompt engineering ```python def well_formed_prompt(task, context, examples=[], format='json'): return f"""## Context {context} ## Examples {format_examples(examples)} ## Task {task} ## Output format {format}""" ``` ### RAG-augmented gen ```python def rag_generate(question, retriever, llm): docs = retriever.retrieve(question, k=5) context = '\n'.join(d.text for d in docs) return llm.generate(f"Context:\n{context}\n\nQuestion: {question}\nAnswer with citations:") ``` ### Fine-tune (LoRA) ```python from peft import LoraConfig, get_peft_model config = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj']) model = get_peft_model(base_model, config) # 매 train on task data ``` ### Generation cost monitoring ```python def cost_track(usage): pricing = {'claude-opus-4-7': {'in': 15/1e6, 'out': 75/1e6}} cost = usage.input_tokens * pricing[model]['in'] + usage.output_tokens * pricing[model]['out'] return cost ``` ### Eval (LLM-as-judge) ```python def llm_judge(output, criteria): prompt = f'Rate {criteria}. Response: {output}. Output JSON: score 0-10.' return json.loads(judge.generate(prompt))['score'] ``` ### Brand safety ```python def brand_safe(output): return classify_toxicity(output) < 0.05 and not has_competitor(output) and has_brand_voice(output) ``` ## 매 결정 기준 | 상황 | Tool | |---|---| | Best text quality | Claude Opus 4.7 / GPT-5 | | Cost-aware text | Claude Sonnet / GPT-4o-mini | | Best image | FLUX / Midjourney v7 | | Fast image | SDXL Turbo / FLUX schnell | | Video | Sora / Veo 2 | | Audio | Suno / ElevenLabs | | Local | Llama 3.x + SDXL local | | Code | Claude / Codex | **기본값**: 매 frontier API + 매 RAG + 매 prompt eng + 매 LLM-judge eval + 매 brand safety + 매 cost track. ## 🔗 Graph - 부모: [[AI]] · [[Foundation-Models]] - 변형: [[Transformer_Architecture_and_LLM_Foundations|LLM]] · [[Diffusion-Models]] · [[Multimodal-LLM]] - 응용: [[Generative-Adversarial-Networks]] · [[Stable-Diffusion]] - Adjacent: [[RAG]] · [[Fine-tuning]] · [[Prompt_Engineering|Prompt-Engineering]] · [[Ethics & AI]] ## 🤖 LLM 활용 **언제**: 매 모든 productivity, creative, customer-facing. **언제 X**: 매 deterministic compute. 매 IP-strict (with care). ## ❌ 안티패턴 - **Hallucination 의 ship**: 매 verify. - **No watermark**: 매 misinformation. - **No copyright check**: 매 legal risk. - **Single model lock-in**: 매 API down → outage. - **No cost monitoring**: 매 bill shock. ## 🧪 검증 / 중복 - Verified (Anthropic, OpenAI, Stability, Google docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-20 | Auto | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — modalities + 매 text / image / video / audio / 3D / agent code |