Files
2nd/10_Wiki/Topics/AI_and_ML/AI 이미지 생성 (AI Image Generation).md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

520 lines
15 KiB
Markdown

---
id: wiki-2026-0508-ai-이미지-생성-ai-image-generation
title: AI Image Generation
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [AI 이미지 생성, image gen, text-to-image, Midjourney, DALL-E, Stable Diffusion, Flux, Imagen, diffusion model]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: conceptual
tags: [image-generation, diffusion-model, stable-diffusion, midjourney, dalle, flux, prompt-engineering, controlnet, lora]
raw_sources: []
last_reinforced: 2026-05-09
github_commit: pending
inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
tech_stack:
language: Python / API
framework: Diffusers / ComfyUI / Automatic1111 / Flux / SD WebUI
---
# AI Image Generation
## 📌 한 줄 통찰 (The Karpathy Summary)
> **Diffusion model 의 text → image**. 매 prompt 의 noise 의 progressive denoise. **Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control)** 의 매 specialty. **Prompt + parameter + reference + negative** 의 4 lever.
## 📖 구조화된 지식 (Synthesized Content)
### 핵심 architecture
#### Diffusion model
1. **Forward diffusion**: image → noise (training).
2. **Reverse diffusion**: noise → image (inference).
3. **Text encoder**: prompt → embedding.
4. **Cross-attention**: text 의 image 의 guide.
5. **Sampler** (DDIM, DPM++, Euler): denoise step.
→ Stable Diffusion / Flux / Imagen 의 base.
#### GAN (legacy, less common now)
- StyleGAN.
- 매 photorealistic.
- Specific use case.
#### Autoregressive
- DALL-E 1 (legacy).
- VQ-VAE.
→ Modern = diffusion.
### 매 platform
#### Midjourney (예술 / cinematic)
- **Subscription**: $10-60 / month.
- **Discord-based** (legacy) → **alpha web**.
- 매 매개변수: `--ar`, `--v`, `--s`, `--c`.
- 매 reference: `--sref` (style), `--cref` (character), `--oref` (omni).
- V7 (2024-2025) 의 draft mode (10x faster).
- 매 commercial-friendly.
#### DALL-E 3 (자연어)
- **OpenAI** / ChatGPT integration.
- 매 GPT-4 의 prompt expansion.
- 매 정확 instruction following.
- 매 text rendering 강력.
- 매 negative prompt 약함.
#### Stable Diffusion (open / control)
- **Open weights** (CreativeML OpenRAIL-M).
- 매 local self-host.
- ComfyUI / Automatic1111 / Forge UI.
- LoRA / fine-tune / ControlNet.
- 매 weighted prompt: `(keyword:1.2)`.
- 매 negative prompt 강력.
#### Flux (modern open, 2024+)
- **Black Forest Labs** (Stable Diffusion 의 originator).
- Flux.1 [dev] / [schnell] / [pro].
- 매 SDXL 보다 좋음 (2024 SoTA).
- 매 hand / text 의 정확 ↑.
#### Imagen / Veo (Google)
- 매 Imagen 3.
- Cloud API.
#### Adobe Firefly
- 매 commercial license-safe.
- Adobe Creative Cloud.
#### 기타
- Ideogram (text in image).
- Recraft (vector).
- Krea (real-time).
- NovelAI (anime).
### Prompt structure (universal)
#### 4 layer
1. **Subject**: "young woman, age 25, blue eyes".
2. **Medium / style**: "oil painting, Renaissance style".
3. **Composition / environment**: "close-up portrait, golden hour, mountain background".
4. **Technical**: "85mm lens, shallow depth of field, --ar 3:2".
#### 매 layer 의 specificity ↑ = quality ↑.
### Parameters (Midjourney)
- `--ar 16:9`: aspect ratio.
- `--v 7`: version.
- `--s 250`: stylize (artistic strength, 0-1000).
- `--c 50`: chaos (variety, 0-100).
- `--sref [URL]`: style reference.
- `--cref [URL]`: character reference.
- `--oref [URL]`: omni reference (V7).
- `--no [thing]`: simple negative.
- `--niji`: anime model.
- `--draft`: draft mode (10x faster).
### Stable Diffusion 의 추가 control
#### Weighted prompt
```
(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]
```
→ 매 keyword 의 weight ↑/↓.
#### Negative prompt (강력)
```
ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality
```
→ 매 unwanted 의 explicit exclude.
#### CFG Scale (1-30)
- Classifier-Free Guidance.
- 매 prompt adherence ↑ vs creativity ↑.
- Default 7-12.
#### Sampling steps (10-50)
- 매 denoise 의 iteration.
- 매 quality ↑ + cost ↑.
- DPM++ 2M Karras = sweet (20-30 step).
#### Sampler choice
- Euler a, DPM++ 2M Karras, UniPC, ...
- 매 different style.
### Advanced control
#### LoRA (Low-Rank Adaptation)
- 매 specific style / character 의 fine-tune.
- 매 small file (~100 MB).
- 매 multiple LoRA 의 stack.
#### ControlNet
- 매 pose / depth / edge 의 forced.
- Canny edge → image.
- OpenPose → image.
- Depth map → image.
#### IP-Adapter
- 매 image 의 reference style.
#### Inpainting
- 매 specific region 의 redo.
- 매 mask + prompt.
#### Outpainting / zoom out
- 매 canvas 의 extend.
### Image-to-image (img2img)
```
Input image + prompt → modified image
```
→ 매 style transfer / variation.
### Modern workflow patterns
#### Draft → upscale
1. **Draft mode**: 매 dozen variant (cheap).
2. **Select best**.
3. **Upscale + refine**.
→ Midjourney / Flux 의 standard.
#### LoRA stacking
1. **Base model** (SDXL / Flux).
2. **Style LoRA** (e.g. anime, oil paint).
3. **Character LoRA** (specific person).
4. **Concept LoRA** (specific pose / object).
#### Img2img + ControlNet (precise)
1. **Sketch**.
2. **ControlNet 의 line art guidance**.
3. **Generate + iterate**.
#### Inpainting workflow
1. **Generate base**.
2. **Identify defect** (extra finger, watermark).
3. **Mask + inpaint with negative**.
### Common defects + fix
| Defect | Fix |
|---|---|
| Extra fingers | Negative: "extra fingers, malformed hands" + LoRA |
| Asian-only faces | Specific ethnicity in prompt |
| Anime-only style | "photorealistic" + 비-anime model |
| Watermark | Negative: "watermark, signature, text" |
| Bad anatomy | Negative + ControlNet OpenPose |
| Blurry | Negative: "blurry" + steps ↑ |
| Wrong aspect | `--ar 16:9` |
| Generic face | "specific name, distinct features" |
### 매 platform 의 differences
#### Negative prompt
- **Stable Diffusion / Flux**: explicit negative section, very strong.
- **Midjourney**: `--no [thing]` (limited).
- **DALL-E 3**: weak (often makes the thing).
#### Prompt style
- **DALL-E 3**: natural language sentence.
- **Midjourney**: comma-separated keyword + parameter.
- **Stable Diffusion**: tag-based, weighted.
#### Photorealism
- **Stable Diffusion / Flux**: "photorealistic" works.
- **Midjourney**: implicit (cinematic feel).
- **DALL-E 3**: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).
### 매 commercial / IP
#### License
- Midjourney: commercial OK (paid).
- DALL-E 3: commercial OK.
- Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
- Adobe Firefly: commercial-safe (training data licensed).
#### 매 lawsuit
- Getty vs Stable Diffusion (training data).
- Artists vs Midjourney (style mimicry).
#### Transparent disclosure
- 매 country 의 AI-generated 의 label requirement (EU AI Act).
## 💻 코드 패턴 (Code Patterns)
### Stable Diffusion (Diffusers library)
```python
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
).to("cuda")
# Generate
image = pipe(
prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
negative_prompt="blurry, deformed, watermark, signature",
num_inference_steps=30,
guidance_scale=7.5,
).images[0]
image.save("output.png")
```
### Flux (modern)
```python
from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A cat holding a sign that says 'Hello World'",
height=1024, width=1024,
guidance_scale=3.5,
num_inference_steps=50,
).images[0]
```
### LoRA loading
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("base-model")
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")
# Stack LoRA
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe(prompt="...").images[0]
```
### ControlNet (pose-controlled)
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
).to("cuda")
pose_image = Image.open("pose.png") # OpenPose extracted
image = pipe(
prompt="elegant woman, evening gown, studio lighting",
image=pose_image,
num_inference_steps=30,
).images[0]
```
### Img2img
```python
from diffusers import StableDiffusionImg2ImgPipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
init = Image.open("sketch.png")
image = pipe(
prompt="oil painting of mountain, sunset, masterpiece",
image=init,
strength=0.7, # 0 = no change, 1 = total
guidance_scale=7.5,
).images[0]
```
### Inpainting
```python
from diffusers import StableDiffusionInpaintPipeline
from PIL import Image
pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")
original = Image.open("photo.png")
mask = Image.open("mask.png") # white = redo, black = keep
image = pipe(
prompt="clean background, professional photo",
image=original,
mask_image=mask,
num_inference_steps=30,
).images[0]
```
### Midjourney (Discord bot, no official API)
```
# Discord
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...
```
→ Discord webhook 의 monitoring, 또는 unofficial API.
### DALL-E 3 (OpenAI API)
```python
from openai import OpenAI
client = OpenAI()
response = client.images.generate(
model="dall-e-3",
prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
n=1,
size="1024x1024",
quality="hd",
style="natural", # or "vivid"
)
print(response.data[0].url)
```
### Flux Replicate API
```python
import replicate
output = replicate.run(
"black-forest-labs/flux-dev",
input={
"prompt": "A cat holding a sign...",
"guidance_scale": 3.5,
"num_inference_steps": 50,
}
)
print(output[0]) # URL
```
### Batch generation (cost-efficient)
```python
prompts = [f"variant {i}: cat with hat" for i in range(10)]
# Batch (faster than serial)
images = pipe(prompts, num_inference_steps=30).images
for i, img in enumerate(images):
img.save(f"batch_{i}.png")
```
### ComfyUI workflow (visual node)
```
[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
[LoRALoader] → [ControlNet]
```
→ 매 node 의 reorder. 매 user 의 own pipeline.
### Custom prompt template
```python
def build_prompt(subject, style, lighting, lens):
return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"
prompt = build_prompt(
subject="young woman, blue eyes",
style="oil painting, Renaissance",
lighting="golden hour, volumetric",
lens="85mm portrait lens, shallow depth of field"
)
```
### Quality eval (CLIP score)
```python
from transformers import CLIPProcessor, CLIPModel
import torch
from PIL import Image
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
image = Image.open("output.png")
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
print(f"CLIP score: {similarity:.3f}")
```
→ 매 prompt-image alignment 의 quantitative.
## 🤔 의사결정 기준 (Decision Criteria)
| 작업 | 추천 |
|---|---|
| Quick prototype | DALL-E 3 / Midjourney |
| Cinematic / artistic | Midjourney V7 |
| Natural language | DALL-E 3 |
| Open / control / privacy | Stable Diffusion / Flux |
| Photorealism | Flux / SDXL + LoRA |
| Anime / illustration | NovelAI / Niji |
| Commercial-safe | Adobe Firefly |
| Specific character | LoRA + reference |
| Pose-controlled | ControlNet |
| Text in image | Flux / Ideogram |
**기본값**: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **DALL-E 3 의 부정 prompt 약**: "no X" 가 X 추가 가능. Positive 의 specify.
- **Stable Diffusion 의 hardware 요구**: 매 GPU 가 필요 (RTX 3090+ 추천).
- **Midjourney 의 closed**: 매 internal optimization 의 unknown.
- **Training data 의 lawsuit**: 매 model 의 future legal status 의 uncertain.
- **매 model 의 evolution**: 매 6 month 의 best 가 다름.
- **Flux 의 emerging**: 매 modern SoTA 가 SDXL 의 surpass.
## 🔗 지식 연결 (Graph)
- 부모: [[Generative-AI]] · [[Diffusion-Models]] · [[Computer Vision|Computer-Vision]]
- 변형: [[Stable-Diffusion]] · [[Flux]] · [[Midjourney]] · [[DALL-E]] · [[Imagen]]
- 응용: [[ControlNet]] · [[LoRA]] · [[Inpainting]] · [[IP-Adapter]]
- 기법: [[Prompt_Engineering|Prompt-Engineering]] · [[Negative Prompt]] · [[CFG Scale]] · [[Sampling-Steps]]
- Tools: [[ComfyUI]]
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
**언제 이 지식을 쓰는가:**
- 매 art / design workflow 의 AI integration.
- 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
- 매 commercial project 의 license consideration.
- 매 prompt iteration 의 systematic.
- 매 self-host / privacy / cost 결정.
**언제 쓰면 안 되는가:**
- Specific art critique (artist-level).
- 매 country 의 specific copyright (lawyer).
- 매 deepfake / harmful generation (ethics).
- Photo retouching (Photoshop) 의 better.
## ❌ 안티패턴 (Anti-Patterns)
- **Vague prompt** ("nice picture"): generic.
- **Long word salad**: contradictory output.
- **DALL-E 3 + negative prompt**: 매 thing 의 add.
- **Midjourney + Stable Diffusion 의 same syntax**: parameter X.
- **No iteration**: 매 1 try 의 acceptance.
- **Cloud generation + sensitive content**: privacy.
- **Commercial use + license unclear**: legal risk.
- **No prompt template / library**: 매 매 generation 의 reinvent.
## 🧪 검증 상태 (Validation)
- **정보 상태:** verified (concept-level).
- **출처 신뢰도:** B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
- **검토 이유:** Manual cleanup. 매 platform 의 매 6 month 의 evolution.
## 🧬 중복 검사 (Duplicate Check)
- **기존 유사 문서:** [[AI_Image_Generation_Workflow]] (related), [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] (related), [[Diffusion-Models]] (parent).
- **처리 방식:** KEEP (focused on platform / prompt comparison).
- **처리 이유:** 매 별 file 의 different angle.
## 🕓 변경 이력 (Changelog)
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|------|-----------|-----------|--------|
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
| 2026-05-09 | Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 | UPDATE | B |