2nd/10_Wiki/Topics/AI_and_ML/AI 이미지 생성 (AI Image Generation).md

---
id: wiki-2026-0508-ai-이미지-생성-ai-image-generation
title: AI Image Generation
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [AI 이미지 생성, image gen, text-to-image, Midjourney, DALL-E, Stable Diffusion, Flux, Imagen, diffusion model]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: conceptual
tags: [image-generation, diffusion-model, stable-diffusion, midjourney, dalle, flux, prompt-engineering, controlnet, lora]
raw_sources: []
last_reinforced: 2026-05-09
github_commit: pending
inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
tech_stack:
  language: Python / API
  framework: Diffusers / ComfyUI / Automatic1111 / Flux / SD WebUI
---

# AI Image Generation

## 📌 한 줄 통찰 (The Karpathy Summary)
> **Diffusion model 의 text → image**. 매 prompt 의 noise 의 progressive denoise. **Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control)** 의 매 specialty. **Prompt + parameter + reference + negative** 의 4 lever.

## 📖 구조화된 지식 (Synthesized Content)

### 핵심 architecture

#### Diffusion model
1. **Forward diffusion**: image → noise (training).
2. **Reverse diffusion**: noise → image (inference).
3. **Text encoder**: prompt → embedding.
4. **Cross-attention**: text 의 image 의 guide.
5. **Sampler** (DDIM, DPM++, Euler): denoise step.

→ Stable Diffusion / Flux / Imagen 의 base.

#### GAN (legacy, less common now)
- StyleGAN.
- 매 photorealistic.
- Specific use case.

#### Autoregressive
- DALL-E 1 (legacy).
- VQ-VAE.

→ Modern = diffusion.

### 매 platform

#### Midjourney (예술 / cinematic)
- **Subscription**: $10-60 / month.
- **Discord-based** (legacy) → **alpha web**.
- 매 매개변수: `--ar`, `--v`, `--s`, `--c`.
- 매 reference: `--sref` (style), `--cref` (character), `--oref` (omni).
- V7 (2024-2025) 의 draft mode (10x faster).
- 매 commercial-friendly.

#### DALL-E 3 (자연어)
- **OpenAI** / ChatGPT integration.
- 매 GPT-4 의 prompt expansion.
- 매 정확 instruction following.
- 매 text rendering 강력.
- 매 negative prompt 약함.

#### Stable Diffusion (open / control)
- **Open weights** (CreativeML OpenRAIL-M).
- 매 local self-host.
- ComfyUI / Automatic1111 / Forge UI.
- LoRA / fine-tune / ControlNet.
- 매 weighted prompt: `(keyword:1.2)`.
- 매 negative prompt 강력.

#### Flux (modern open, 2024+)
- **Black Forest Labs** (Stable Diffusion 의 originator).
- Flux.1 [dev] / [schnell] / [pro].
- 매 SDXL 보다 좋음 (2024 SoTA).
- 매 hand / text 의 정확 ↑.

#### Imagen / Veo (Google)
- 매 Imagen 3.
- Cloud API.

#### Adobe Firefly
- 매 commercial license-safe.
- Adobe Creative Cloud.

#### 기타
- Ideogram (text in image).
- Recraft (vector).
- Krea (real-time).
- NovelAI (anime).

### Prompt structure (universal)

#### 4 layer
1. **Subject**: "young woman, age 25, blue eyes".
2. **Medium / style**: "oil painting, Renaissance style".
3. **Composition / environment**: "close-up portrait, golden hour, mountain background".
4. **Technical**: "85mm lens, shallow depth of field, --ar 3:2".

#### 매 layer 의 specificity ↑ = quality ↑.

### Parameters (Midjourney)
- `--ar 16:9`: aspect ratio.
- `--v 7`: version.
- `--s 250`: stylize (artistic strength, 0-1000).
- `--c 50`: chaos (variety, 0-100).
- `--sref [URL]`: style reference.
- `--cref [URL]`: character reference.
- `--oref [URL]`: omni reference (V7).
- `--no [thing]`: simple negative.
- `--niji`: anime model.
- `--draft`: draft mode (10x faster).

### Stable Diffusion 의 추가 control

#### Weighted prompt
```
(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]
```

→ 매 keyword 의 weight ↑/↓.

#### Negative prompt (강력)
```
ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality
```

→ 매 unwanted 의 explicit exclude.

#### CFG Scale (1-30)
- Classifier-Free Guidance.
- 매 prompt adherence ↑ vs creativity ↑.
- Default 7-12.

#### Sampling steps (10-50)
- 매 denoise 의 iteration.
- 매 quality ↑ + cost ↑.
- DPM++ 2M Karras = sweet (20-30 step).

#### Sampler choice
- Euler a, DPM++ 2M Karras, UniPC, ...
- 매 different style.

### Advanced control

#### LoRA (Low-Rank Adaptation)
- 매 specific style / character 의 fine-tune.
- 매 small file (~100 MB).
- 매 multiple LoRA 의 stack.

#### ControlNet
- 매 pose / depth / edge 의 forced.
- Canny edge → image.
- OpenPose → image.
- Depth map → image.

#### IP-Adapter
- 매 image 의 reference style.

#### Inpainting
- 매 specific region 의 redo.
- 매 mask + prompt.

#### Outpainting / zoom out
- 매 canvas 의 extend.

### Image-to-image (img2img)
```
Input image + prompt → modified image
```

→ 매 style transfer / variation.

### Modern workflow patterns

#### Draft → upscale
1. **Draft mode**: 매 dozen variant (cheap).
2. **Select best**.
3. **Upscale + refine**.

→ Midjourney / Flux 의 standard.

#### LoRA stacking
1. **Base model** (SDXL / Flux).
2. **Style LoRA** (e.g. anime, oil paint).
3. **Character LoRA** (specific person).
4. **Concept LoRA** (specific pose / object).

#### Img2img + ControlNet (precise)
1. **Sketch**.
2. **ControlNet 의 line art guidance**.
3. **Generate + iterate**.

#### Inpainting workflow
1. **Generate base**.
2. **Identify defect** (extra finger, watermark).
3. **Mask + inpaint with negative**.

### Common defects + fix

| Defect | Fix |
|---|---|
| Extra fingers | Negative: "extra fingers, malformed hands" + LoRA |
| Asian-only faces | Specific ethnicity in prompt |
| Anime-only style | "photorealistic" + 비-anime model |
| Watermark | Negative: "watermark, signature, text" |
| Bad anatomy | Negative + ControlNet OpenPose |
| Blurry | Negative: "blurry" + steps ↑ |
| Wrong aspect | `--ar 16:9` |
| Generic face | "specific name, distinct features" |

### 매 platform 의 differences

#### Negative prompt
- **Stable Diffusion / Flux**: explicit negative section, very strong.
- **Midjourney**: `--no [thing]` (limited).
- **DALL-E 3**: weak (often makes the thing).

#### Prompt style
- **DALL-E 3**: natural language sentence.
- **Midjourney**: comma-separated keyword + parameter.
- **Stable Diffusion**: tag-based, weighted.

#### Photorealism
- **Stable Diffusion / Flux**: "photorealistic" works.
- **Midjourney**: implicit (cinematic feel).
- **DALL-E 3**: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).

### 매 commercial / IP

#### License
- Midjourney: commercial OK (paid).
- DALL-E 3: commercial OK.
- Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
- Adobe Firefly: commercial-safe (training data licensed).

#### 매 lawsuit
- Getty vs Stable Diffusion (training data).
- Artists vs Midjourney (style mimicry).

#### Transparent disclosure
- 매 country 의 AI-generated 의 label requirement (EU AI Act).

## 💻 코드 패턴 (Code Patterns)

### Stable Diffusion (Diffusers library)
```python
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
).to("cuda")

# Generate
image = pipe(
    prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
    negative_prompt="blurry, deformed, watermark, signature",
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]

image.save("output.png")
```

### Flux (modern)
```python
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A cat holding a sign that says 'Hello World'",
    height=1024, width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
).images[0]
```

### LoRA loading
```python
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("base-model")
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")

# Stack LoRA
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe(prompt="...").images[0]
```

### ControlNet (pose-controlled)
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
).to("cuda")

pose_image = Image.open("pose.png")   # OpenPose extracted

image = pipe(
    prompt="elegant woman, evening gown, studio lighting",
    image=pose_image,
    num_inference_steps=30,
).images[0]
```

### Img2img
```python
from diffusers import StableDiffusionImg2ImgPipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
init = Image.open("sketch.png")

image = pipe(
    prompt="oil painting of mountain, sunset, masterpiece",
    image=init,
    strength=0.7,   # 0 = no change, 1 = total
    guidance_scale=7.5,
).images[0]
```

### Inpainting
```python
from diffusers import StableDiffusionInpaintPipeline
from PIL import Image

pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")

original = Image.open("photo.png")
mask = Image.open("mask.png")   # white = redo, black = keep

image = pipe(
    prompt="clean background, professional photo",
    image=original,
    mask_image=mask,
    num_inference_steps=30,
).images[0]
```

### Midjourney (Discord bot, no official API)
```
# Discord
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...
```

→ Discord webhook 의 monitoring, 또는 unofficial API.

### DALL-E 3 (OpenAI API)
```python
from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
    n=1,
    size="1024x1024",
    quality="hd",
    style="natural",   # or "vivid"
)

print(response.data[0].url)
```

### Flux Replicate API
```python
import replicate

output = replicate.run(
    "black-forest-labs/flux-dev",
    input={
        "prompt": "A cat holding a sign...",
        "guidance_scale": 3.5,
        "num_inference_steps": 50,
    }
)

print(output[0])  # URL
```

### Batch generation (cost-efficient)
```python
prompts = [f"variant {i}: cat with hat" for i in range(10)]

# Batch (faster than serial)
images = pipe(prompts, num_inference_steps=30).images
for i, img in enumerate(images):
    img.save(f"batch_{i}.png")
```

### ComfyUI workflow (visual node)
```
[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
                                ↓
                        [LoRALoader] → [ControlNet]
```

→ 매 node 의 reorder. 매 user 의 own pipeline.

### Custom prompt template
```python
def build_prompt(subject, style, lighting, lens):
    return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"

prompt = build_prompt(
    subject="young woman, blue eyes",
    style="oil painting, Renaissance",
    lighting="golden hour, volumetric",
    lens="85mm portrait lens, shallow depth of field"
)
```

### Quality eval (CLIP score)
```python
from transformers import CLIPProcessor, CLIPModel
import torch
from PIL import Image

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

image = Image.open("output.png")
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)

similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
print(f"CLIP score: {similarity:.3f}")
```

→ 매 prompt-image alignment 의 quantitative.

## 🤔 의사결정 기준 (Decision Criteria)

| 작업 | 추천 |
|---|---|
| Quick prototype | DALL-E 3 / Midjourney |
| Cinematic / artistic | Midjourney V7 |
| Natural language | DALL-E 3 |
| Open / control / privacy | Stable Diffusion / Flux |
| Photorealism | Flux / SDXL + LoRA |
| Anime / illustration | NovelAI / Niji |
| Commercial-safe | Adobe Firefly |
| Specific character | LoRA + reference |
| Pose-controlled | ControlNet |
| Text in image | Flux / Ideogram |

**기본값**: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).

## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **DALL-E 3 의 부정 prompt 약**: "no X" 가 X 추가 가능. Positive 의 specify.
- **Stable Diffusion 의 hardware 요구**: 매 GPU 가 필요 (RTX 3090+ 추천).
- **Midjourney 의 closed**: 매 internal optimization 의 unknown.
- **Training data 의 lawsuit**: 매 model 의 future legal status 의 uncertain.
- **매 model 의 evolution**: 매 6 month 의 best 가 다름.
- **Flux 의 emerging**: 매 modern SoTA 가 SDXL 의 surpass.

## 🔗 지식 연결 (Graph)
- 부모: [[Generative-AI]] · [[Diffusion-Models]] · [[Computer Vision|Computer-Vision]]
- 변형: [[Stable-Diffusion]] · [[Flux]] · [[Midjourney]] · [[DALL-E]] · [[Imagen]]
- 응용: [[ControlNet]] · [[LoRA]] · [[Inpainting]] · [[IP-Adapter]]
- 기법: [[Prompt_Engineering|Prompt-Engineering]] · [[Negative Prompt]] · [[CFG Scale]] · [[Sampling-Steps]]
- Tools: [[ComfyUI]]

## 🤖 LLM 활용 힌트 (How to Use This Knowledge)

**언제 이 지식을 쓰는가:**
- 매 art / design workflow 의 AI integration.
- 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
- 매 commercial project 의 license consideration.
- 매 prompt iteration 의 systematic.
- 매 self-host / privacy / cost 결정.

**언제 쓰면 안 되는가:**
- Specific art critique (artist-level).
- 매 country 의 specific copyright (lawyer).
- 매 deepfake / harmful generation (ethics).
- Photo retouching (Photoshop) 의 better.

## ❌ 안티패턴 (Anti-Patterns)
- **Vague prompt** ("nice picture"): generic.
- **Long word salad**: contradictory output.
- **DALL-E 3 + negative prompt**: 매 thing 의 add.
- **Midjourney + Stable Diffusion 의 same syntax**: parameter X.
- **No iteration**: 매 1 try 의 acceptance.
- **Cloud generation + sensitive content**: privacy.
- **Commercial use + license unclear**: legal risk.
- **No prompt template / library**: 매 매 generation 의 reinvent.

## 🧪 검증 상태 (Validation)
- **정보 상태:** verified (concept-level).
- **출처 신뢰도:** B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
- **검토 이유:** Manual cleanup. 매 platform 의 매 6 month 의 evolution.

## 🧬 중복 검사 (Duplicate Check)
- **기존 유사 문서:** [[AI_Image_Generation_Workflow]] (related), [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] (related), [[Diffusion-Models]] (parent).
- **처리 방식:** KEEP (focused on platform / prompt comparison).
- **처리 이유:** 매 별 file 의 different angle.

## 🕓 변경 이력 (Changelog)
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|------|-----------|-----------|--------|
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
| 2026-05-09 | Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 | UPDATE | B |