Files
2nd/10_Wiki/Topics/AI_and_ML/Generative-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.8 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-generative-ai Generative AI 10_Wiki/Topics verified self
generative AI
gen-AI
generative model
LLM
image gen
video gen
audio gen
multimodal
none A 0.98 applied
generative-ai
ai
llm
diffusion
multimodal
foundation-model
2026-05-10 pending
language framework
Python Anthropic / OpenAI / Stability / HuggingFace

Generative AI

매 한 줄

"매 새로운 content 의 의 의 의 model — 매 text, image, audio, video, code, 3D". 매 modern: Claude, GPT, Gemini, Llama (text), Midjourney/DALL-E/SD (image), Suno (audio), Sora/Veo (video). 매 transformer + diffusion 의 dominant.

매 핵심

매 modality

  • Text: GPT, Claude, Gemini, Llama.
  • Image: Stable Diffusion, Midjourney, DALL-E 3, FLUX, Imagen 3.
  • Video: Sora (OpenAI), Veo (Google), Runway, Pika.
  • Audio / Music: Suno, Udio, MusicLM.
  • Speech: ElevenLabs, OpenAI TTS, Whisper (STT).
  • 3D: Meshy, Tripo, Luma Genie.
  • Code: Codex, CodeLlama, Claude.

매 architecture

  • Transformer (text, code).
  • Diffusion (image, video, audio).
  • Latent diffusion (SD).
  • DiT (Diffusion Transformer): SD3, Sora, FLUX.
  • Mamba / SSM (emerging).

매 modern (2025-2026)

  • Frontier: Claude Opus 4.7, GPT-5, Gemini 2 Ultra.
  • Open: Llama 3.x, Qwen 2.5, FLUX.
  • Multimodal: Sora, Veo 2, Genie 2.
  • Reasoning: o1, o3, R1.

매 응용

  1. Productivity: writing, coding.
  2. Creative: art, music, video.
  3. Customer service: chatbot.
  4. Education: tutor.
  5. Marketing: ad copy, image.
  6. Research: literature review.
  7. Game: NPC, content.

매 risk

  • Hallucination.
  • Copyright (training, output).
  • Misinformation (deepfake).
  • Bias.
  • Energy use.
  • Job displacement.

💻 패턴

Text generation (Claude)

from anthropic import Anthropic
client = Anthropic()
r = client.messages.create(model='claude-opus-4-7', max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Write a haiku about AI'}])

Image (Stable Diffusion)

from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained('stabilityai/sdxl-turbo', torch_dtype=torch.float16).to('cuda')
img = pipe('a sunset over mountains', num_inference_steps=4).images[0]

FLUX (modern)

from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-schnell', torch_dtype=torch.bfloat16).to('cuda')
img = pipe('photorealistic forest', num_inference_steps=4).images[0]

Video (Sora-like)

# 매 OpenAI Sora API (when available)
client.videos.generate(model='sora-1', prompt='a cat playing piano', duration_s=10)

Audio (Suno-like)

# 매 commercial APIs
suno_client.generate_song(prompt='upbeat synth-pop', duration_s=180)

TTS (ElevenLabs)

import elevenlabs
audio = elevenlabs.generate(text='Hello world', voice='Adam', model='eleven_multilingual_v2')

3D (Tripo / Meshy)

# 매 image → 3D model
mesh = tripo_client.image_to_mesh('input.png')
mesh.save('output.glb')

Multimodal (Claude vision)

client.messages.create(model='claude-opus-4-7', max_tokens=1024,
    messages=[{'role': 'user', 'content': [
        {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
        {'type': 'text', 'text': 'Describe this image'},
    ]}])

Agent (multi-step)

def agent_loop(goal, tools, max_steps=10):
    history = [{'role': 'user', 'content': goal}]
    for _ in range(max_steps):
        r = client.messages.create(model='claude-opus-4-7', tools=tools, messages=history)
        if r.stop_reason == 'end_turn': return r
        # 매 execute tool, append result

Watermark (C2PA)

from c2pa import Signer
Signer(cert).sign('output.png', claims={
    'generator': 'AI', 'model': 'flux-1-schnell',
})

Prompt engineering

def well_formed_prompt(task, context, examples=[], format='json'):
    return f"""## Context
{context}

## Examples
{format_examples(examples)}

## Task
{task}

## Output format
{format}"""

RAG-augmented gen

def rag_generate(question, retriever, llm):
    docs = retriever.retrieve(question, k=5)
    context = '\n'.join(d.text for d in docs)
    return llm.generate(f"Context:\n{context}\n\nQuestion: {question}\nAnswer with citations:")

Fine-tune (LoRA)

from peft import LoraConfig, get_peft_model
config = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'])
model = get_peft_model(base_model, config)
# 매 train on task data

Generation cost monitoring

def cost_track(usage):
    pricing = {'claude-opus-4-7': {'in': 15/1e6, 'out': 75/1e6}}
    cost = usage.input_tokens * pricing[model]['in'] + usage.output_tokens * pricing[model]['out']
    return cost

Eval (LLM-as-judge)

def llm_judge(output, criteria):
    prompt = f'Rate {criteria}. Response: {output}. Output JSON: score 0-10.'
    return json.loads(judge.generate(prompt))['score']

Brand safety

def brand_safe(output):
    return classify_toxicity(output) < 0.05 and not has_competitor(output) and has_brand_voice(output)

매 결정 기준

상황 Tool
Best text quality Claude Opus 4.7 / GPT-5
Cost-aware text Claude Sonnet / GPT-4o-mini
Best image FLUX / Midjourney v7
Fast image SDXL Turbo / FLUX schnell
Video Sora / Veo 2
Audio Suno / ElevenLabs
Local Llama 3.x + SDXL local
Code Claude / Codex

기본값: 매 frontier API + 매 RAG + 매 prompt eng + 매 LLM-judge eval + 매 brand safety + 매 cost track.

🔗 Graph

🤖 LLM 활용

언제: 매 모든 productivity, creative, customer-facing. 언제 X: 매 deterministic compute. 매 IP-strict (with care).

안티패턴

  • Hallucination 의 ship: 매 verify.
  • No watermark: 매 misinformation.
  • No copyright check: 매 legal risk.
  • Single model lock-in: 매 API down → outage.
  • No cost monitoring: 매 bill shock.

🧪 검증 / 중복

  • Verified (Anthropic, OpenAI, Stability, Google docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-20 Auto
2026-05-08 Phase 1
2026-05-10 Manual cleanup — modalities + 매 text / image / video / audio / 3D / agent code