d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
520 lines
15 KiB
Markdown
520 lines
15 KiB
Markdown
---
|
|
id: wiki-2026-0508-ai-이미지-생성-ai-image-generation
|
|
title: AI Image Generation
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [AI 이미지 생성, image gen, text-to-image, Midjourney, DALL-E, Stable Diffusion, Flux, Imagen, diffusion model]
|
|
duplicate_of: none
|
|
source_trust_level: B
|
|
confidence_score: 0.85
|
|
verification_status: conceptual
|
|
tags: [image-generation, diffusion-model, stable-diffusion, midjourney, dalle, flux, prompt-engineering, controlnet, lora]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-09
|
|
github_commit: pending
|
|
inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
|
|
tech_stack:
|
|
language: Python / API
|
|
framework: Diffusers / ComfyUI / Automatic1111 / Flux / SD WebUI
|
|
---
|
|
|
|
# AI Image Generation
|
|
|
|
## 📌 한 줄 통찰 (The Karpathy Summary)
|
|
> **Diffusion model 의 text → image**. 매 prompt 의 noise 의 progressive denoise. **Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control)** 의 매 specialty. **Prompt + parameter + reference + negative** 의 4 lever.
|
|
|
|
## 📖 구조화된 지식 (Synthesized Content)
|
|
|
|
### 핵심 architecture
|
|
|
|
#### Diffusion model
|
|
1. **Forward diffusion**: image → noise (training).
|
|
2. **Reverse diffusion**: noise → image (inference).
|
|
3. **Text encoder**: prompt → embedding.
|
|
4. **Cross-attention**: text 의 image 의 guide.
|
|
5. **Sampler** (DDIM, DPM++, Euler): denoise step.
|
|
|
|
→ Stable Diffusion / Flux / Imagen 의 base.
|
|
|
|
#### GAN (legacy, less common now)
|
|
- StyleGAN.
|
|
- 매 photorealistic.
|
|
- Specific use case.
|
|
|
|
#### Autoregressive
|
|
- DALL-E 1 (legacy).
|
|
- VQ-VAE.
|
|
|
|
→ Modern = diffusion.
|
|
|
|
### 매 platform
|
|
|
|
#### Midjourney (예술 / cinematic)
|
|
- **Subscription**: $10-60 / month.
|
|
- **Discord-based** (legacy) → **alpha web**.
|
|
- 매 매개변수: `--ar`, `--v`, `--s`, `--c`.
|
|
- 매 reference: `--sref` (style), `--cref` (character), `--oref` (omni).
|
|
- V7 (2024-2025) 의 draft mode (10x faster).
|
|
- 매 commercial-friendly.
|
|
|
|
#### DALL-E 3 (자연어)
|
|
- **OpenAI** / ChatGPT integration.
|
|
- 매 GPT-4 의 prompt expansion.
|
|
- 매 정확 instruction following.
|
|
- 매 text rendering 강력.
|
|
- 매 negative prompt 약함.
|
|
|
|
#### Stable Diffusion (open / control)
|
|
- **Open weights** (CreativeML OpenRAIL-M).
|
|
- 매 local self-host.
|
|
- ComfyUI / Automatic1111 / Forge UI.
|
|
- LoRA / fine-tune / ControlNet.
|
|
- 매 weighted prompt: `(keyword:1.2)`.
|
|
- 매 negative prompt 강력.
|
|
|
|
#### Flux (modern open, 2024+)
|
|
- **Black Forest Labs** (Stable Diffusion 의 originator).
|
|
- Flux.1 [dev] / [schnell] / [pro].
|
|
- 매 SDXL 보다 좋음 (2024 SoTA).
|
|
- 매 hand / text 의 정확 ↑.
|
|
|
|
#### Imagen / Veo (Google)
|
|
- 매 Imagen 3.
|
|
- Cloud API.
|
|
|
|
#### Adobe Firefly
|
|
- 매 commercial license-safe.
|
|
- Adobe Creative Cloud.
|
|
|
|
#### 기타
|
|
- Ideogram (text in image).
|
|
- Recraft (vector).
|
|
- Krea (real-time).
|
|
- NovelAI (anime).
|
|
|
|
### Prompt structure (universal)
|
|
|
|
#### 4 layer
|
|
1. **Subject**: "young woman, age 25, blue eyes".
|
|
2. **Medium / style**: "oil painting, Renaissance style".
|
|
3. **Composition / environment**: "close-up portrait, golden hour, mountain background".
|
|
4. **Technical**: "85mm lens, shallow depth of field, --ar 3:2".
|
|
|
|
#### 매 layer 의 specificity ↑ = quality ↑.
|
|
|
|
### Parameters (Midjourney)
|
|
- `--ar 16:9`: aspect ratio.
|
|
- `--v 7`: version.
|
|
- `--s 250`: stylize (artistic strength, 0-1000).
|
|
- `--c 50`: chaos (variety, 0-100).
|
|
- `--sref [URL]`: style reference.
|
|
- `--cref [URL]`: character reference.
|
|
- `--oref [URL]`: omni reference (V7).
|
|
- `--no [thing]`: simple negative.
|
|
- `--niji`: anime model.
|
|
- `--draft`: draft mode (10x faster).
|
|
|
|
### Stable Diffusion 의 추가 control
|
|
|
|
#### Weighted prompt
|
|
```
|
|
(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]
|
|
```
|
|
|
|
→ 매 keyword 의 weight ↑/↓.
|
|
|
|
#### Negative prompt (강력)
|
|
```
|
|
ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality
|
|
```
|
|
|
|
→ 매 unwanted 의 explicit exclude.
|
|
|
|
#### CFG Scale (1-30)
|
|
- Classifier-Free Guidance.
|
|
- 매 prompt adherence ↑ vs creativity ↑.
|
|
- Default 7-12.
|
|
|
|
#### Sampling steps (10-50)
|
|
- 매 denoise 의 iteration.
|
|
- 매 quality ↑ + cost ↑.
|
|
- DPM++ 2M Karras = sweet (20-30 step).
|
|
|
|
#### Sampler choice
|
|
- Euler a, DPM++ 2M Karras, UniPC, ...
|
|
- 매 different style.
|
|
|
|
### Advanced control
|
|
|
|
#### LoRA (Low-Rank Adaptation)
|
|
- 매 specific style / character 의 fine-tune.
|
|
- 매 small file (~100 MB).
|
|
- 매 multiple LoRA 의 stack.
|
|
|
|
#### ControlNet
|
|
- 매 pose / depth / edge 의 forced.
|
|
- Canny edge → image.
|
|
- OpenPose → image.
|
|
- Depth map → image.
|
|
|
|
#### IP-Adapter
|
|
- 매 image 의 reference style.
|
|
|
|
#### Inpainting
|
|
- 매 specific region 의 redo.
|
|
- 매 mask + prompt.
|
|
|
|
#### Outpainting / zoom out
|
|
- 매 canvas 의 extend.
|
|
|
|
### Image-to-image (img2img)
|
|
```
|
|
Input image + prompt → modified image
|
|
```
|
|
|
|
→ 매 style transfer / variation.
|
|
|
|
### Modern workflow patterns
|
|
|
|
#### Draft → upscale
|
|
1. **Draft mode**: 매 dozen variant (cheap).
|
|
2. **Select best**.
|
|
3. **Upscale + refine**.
|
|
|
|
→ Midjourney / Flux 의 standard.
|
|
|
|
#### LoRA stacking
|
|
1. **Base model** (SDXL / Flux).
|
|
2. **Style LoRA** (e.g. anime, oil paint).
|
|
3. **Character LoRA** (specific person).
|
|
4. **Concept LoRA** (specific pose / object).
|
|
|
|
#### Img2img + ControlNet (precise)
|
|
1. **Sketch**.
|
|
2. **ControlNet 의 line art guidance**.
|
|
3. **Generate + iterate**.
|
|
|
|
#### Inpainting workflow
|
|
1. **Generate base**.
|
|
2. **Identify defect** (extra finger, watermark).
|
|
3. **Mask + inpaint with negative**.
|
|
|
|
### Common defects + fix
|
|
|
|
| Defect | Fix |
|
|
|---|---|
|
|
| Extra fingers | Negative: "extra fingers, malformed hands" + LoRA |
|
|
| Asian-only faces | Specific ethnicity in prompt |
|
|
| Anime-only style | "photorealistic" + 비-anime model |
|
|
| Watermark | Negative: "watermark, signature, text" |
|
|
| Bad anatomy | Negative + ControlNet OpenPose |
|
|
| Blurry | Negative: "blurry" + steps ↑ |
|
|
| Wrong aspect | `--ar 16:9` |
|
|
| Generic face | "specific name, distinct features" |
|
|
|
|
### 매 platform 의 differences
|
|
|
|
#### Negative prompt
|
|
- **Stable Diffusion / Flux**: explicit negative section, very strong.
|
|
- **Midjourney**: `--no [thing]` (limited).
|
|
- **DALL-E 3**: weak (often makes the thing).
|
|
|
|
#### Prompt style
|
|
- **DALL-E 3**: natural language sentence.
|
|
- **Midjourney**: comma-separated keyword + parameter.
|
|
- **Stable Diffusion**: tag-based, weighted.
|
|
|
|
#### Photorealism
|
|
- **Stable Diffusion / Flux**: "photorealistic" works.
|
|
- **Midjourney**: implicit (cinematic feel).
|
|
- **DALL-E 3**: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).
|
|
|
|
### 매 commercial / IP
|
|
|
|
#### License
|
|
- Midjourney: commercial OK (paid).
|
|
- DALL-E 3: commercial OK.
|
|
- Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
|
|
- Adobe Firefly: commercial-safe (training data licensed).
|
|
|
|
#### 매 lawsuit
|
|
- Getty vs Stable Diffusion (training data).
|
|
- Artists vs Midjourney (style mimicry).
|
|
|
|
#### Transparent disclosure
|
|
- 매 country 의 AI-generated 의 label requirement (EU AI Act).
|
|
|
|
## 💻 코드 패턴 (Code Patterns)
|
|
|
|
### Stable Diffusion (Diffusers library)
|
|
```python
|
|
from diffusers import StableDiffusionPipeline
|
|
import torch
|
|
|
|
pipe = StableDiffusionPipeline.from_pretrained(
|
|
"runwayml/stable-diffusion-v1-5",
|
|
torch_dtype=torch.float16,
|
|
).to("cuda")
|
|
|
|
# Generate
|
|
image = pipe(
|
|
prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
|
|
negative_prompt="blurry, deformed, watermark, signature",
|
|
num_inference_steps=30,
|
|
guidance_scale=7.5,
|
|
).images[0]
|
|
|
|
image.save("output.png")
|
|
```
|
|
|
|
### Flux (modern)
|
|
```python
|
|
from diffusers import FluxPipeline
|
|
import torch
|
|
|
|
pipe = FluxPipeline.from_pretrained(
|
|
"black-forest-labs/FLUX.1-dev",
|
|
torch_dtype=torch.bfloat16,
|
|
).to("cuda")
|
|
|
|
image = pipe(
|
|
prompt="A cat holding a sign that says 'Hello World'",
|
|
height=1024, width=1024,
|
|
guidance_scale=3.5,
|
|
num_inference_steps=50,
|
|
).images[0]
|
|
```
|
|
|
|
### LoRA loading
|
|
```python
|
|
from diffusers import StableDiffusionPipeline
|
|
|
|
pipe = StableDiffusionPipeline.from_pretrained("base-model")
|
|
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
|
|
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")
|
|
|
|
# Stack LoRA
|
|
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
|
|
|
|
image = pipe(prompt="...").images[0]
|
|
```
|
|
|
|
### ControlNet (pose-controlled)
|
|
```python
|
|
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
|
from PIL import Image
|
|
|
|
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
|
|
pipe = StableDiffusionControlNetPipeline.from_pretrained(
|
|
"runwayml/stable-diffusion-v1-5",
|
|
controlnet=controlnet,
|
|
).to("cuda")
|
|
|
|
pose_image = Image.open("pose.png") # OpenPose extracted
|
|
|
|
image = pipe(
|
|
prompt="elegant woman, evening gown, studio lighting",
|
|
image=pose_image,
|
|
num_inference_steps=30,
|
|
).images[0]
|
|
```
|
|
|
|
### Img2img
|
|
```python
|
|
from diffusers import StableDiffusionImg2ImgPipeline
|
|
|
|
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
|
|
init = Image.open("sketch.png")
|
|
|
|
image = pipe(
|
|
prompt="oil painting of mountain, sunset, masterpiece",
|
|
image=init,
|
|
strength=0.7, # 0 = no change, 1 = total
|
|
guidance_scale=7.5,
|
|
).images[0]
|
|
```
|
|
|
|
### Inpainting
|
|
```python
|
|
from diffusers import StableDiffusionInpaintPipeline
|
|
from PIL import Image
|
|
|
|
pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")
|
|
|
|
original = Image.open("photo.png")
|
|
mask = Image.open("mask.png") # white = redo, black = keep
|
|
|
|
image = pipe(
|
|
prompt="clean background, professional photo",
|
|
image=original,
|
|
mask_image=mask,
|
|
num_inference_steps=30,
|
|
).images[0]
|
|
```
|
|
|
|
### Midjourney (Discord bot, no official API)
|
|
```
|
|
# Discord
|
|
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...
|
|
```
|
|
|
|
→ Discord webhook 의 monitoring, 또는 unofficial API.
|
|
|
|
### DALL-E 3 (OpenAI API)
|
|
```python
|
|
from openai import OpenAI
|
|
client = OpenAI()
|
|
|
|
response = client.images.generate(
|
|
model="dall-e-3",
|
|
prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
|
|
n=1,
|
|
size="1024x1024",
|
|
quality="hd",
|
|
style="natural", # or "vivid"
|
|
)
|
|
|
|
print(response.data[0].url)
|
|
```
|
|
|
|
### Flux Replicate API
|
|
```python
|
|
import replicate
|
|
|
|
output = replicate.run(
|
|
"black-forest-labs/flux-dev",
|
|
input={
|
|
"prompt": "A cat holding a sign...",
|
|
"guidance_scale": 3.5,
|
|
"num_inference_steps": 50,
|
|
}
|
|
)
|
|
|
|
print(output[0]) # URL
|
|
```
|
|
|
|
### Batch generation (cost-efficient)
|
|
```python
|
|
prompts = [f"variant {i}: cat with hat" for i in range(10)]
|
|
|
|
# Batch (faster than serial)
|
|
images = pipe(prompts, num_inference_steps=30).images
|
|
for i, img in enumerate(images):
|
|
img.save(f"batch_{i}.png")
|
|
```
|
|
|
|
### ComfyUI workflow (visual node)
|
|
```
|
|
[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
|
|
↓
|
|
[LoRALoader] → [ControlNet]
|
|
```
|
|
|
|
→ 매 node 의 reorder. 매 user 의 own pipeline.
|
|
|
|
### Custom prompt template
|
|
```python
|
|
def build_prompt(subject, style, lighting, lens):
|
|
return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"
|
|
|
|
prompt = build_prompt(
|
|
subject="young woman, blue eyes",
|
|
style="oil painting, Renaissance",
|
|
lighting="golden hour, volumetric",
|
|
lens="85mm portrait lens, shallow depth of field"
|
|
)
|
|
```
|
|
|
|
### Quality eval (CLIP score)
|
|
```python
|
|
from transformers import CLIPProcessor, CLIPModel
|
|
import torch
|
|
from PIL import Image
|
|
|
|
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
|
|
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
|
|
|
|
image = Image.open("output.png")
|
|
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
|
|
outputs = model(**inputs)
|
|
|
|
similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
|
|
print(f"CLIP score: {similarity:.3f}")
|
|
```
|
|
|
|
→ 매 prompt-image alignment 의 quantitative.
|
|
|
|
## 🤔 의사결정 기준 (Decision Criteria)
|
|
|
|
| 작업 | 추천 |
|
|
|---|---|
|
|
| Quick prototype | DALL-E 3 / Midjourney |
|
|
| Cinematic / artistic | Midjourney V7 |
|
|
| Natural language | DALL-E 3 |
|
|
| Open / control / privacy | Stable Diffusion / Flux |
|
|
| Photorealism | Flux / SDXL + LoRA |
|
|
| Anime / illustration | NovelAI / Niji |
|
|
| Commercial-safe | Adobe Firefly |
|
|
| Specific character | LoRA + reference |
|
|
| Pose-controlled | ControlNet |
|
|
| Text in image | Flux / Ideogram |
|
|
|
|
**기본값**: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).
|
|
|
|
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
|
|
- **DALL-E 3 의 부정 prompt 약**: "no X" 가 X 추가 가능. Positive 의 specify.
|
|
- **Stable Diffusion 의 hardware 요구**: 매 GPU 가 필요 (RTX 3090+ 추천).
|
|
- **Midjourney 의 closed**: 매 internal optimization 의 unknown.
|
|
- **Training data 의 lawsuit**: 매 model 의 future legal status 의 uncertain.
|
|
- **매 model 의 evolution**: 매 6 month 의 best 가 다름.
|
|
- **Flux 의 emerging**: 매 modern SoTA 가 SDXL 의 surpass.
|
|
|
|
## 🔗 지식 연결 (Graph)
|
|
- 부모: [[Generative-AI]] · [[Diffusion-Models]] · [[Computer Vision|Computer-Vision]]
|
|
- 변형: [[Stable-Diffusion]] · [[Flux]] · [[Midjourney]] · [[DALL-E]] · [[Imagen]]
|
|
- 응용: [[ControlNet]] · [[LoRA]] · [[Inpainting]] · [[IP-Adapter]]
|
|
- 기법: [[Prompt_Engineering|Prompt-Engineering]] · [[Negative Prompt]] · [[CFG Scale]] · [[Sampling-Steps]]
|
|
- Tools: [[ComfyUI]]
|
|
|
|
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
|
|
|
|
**언제 이 지식을 쓰는가:**
|
|
- 매 art / design workflow 의 AI integration.
|
|
- 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
|
|
- 매 commercial project 의 license consideration.
|
|
- 매 prompt iteration 의 systematic.
|
|
- 매 self-host / privacy / cost 결정.
|
|
|
|
**언제 쓰면 안 되는가:**
|
|
- Specific art critique (artist-level).
|
|
- 매 country 의 specific copyright (lawyer).
|
|
- 매 deepfake / harmful generation (ethics).
|
|
- Photo retouching (Photoshop) 의 better.
|
|
|
|
## ❌ 안티패턴 (Anti-Patterns)
|
|
- **Vague prompt** ("nice picture"): generic.
|
|
- **Long word salad**: contradictory output.
|
|
- **DALL-E 3 + negative prompt**: 매 thing 의 add.
|
|
- **Midjourney + Stable Diffusion 의 same syntax**: parameter X.
|
|
- **No iteration**: 매 1 try 의 acceptance.
|
|
- **Cloud generation + sensitive content**: privacy.
|
|
- **Commercial use + license unclear**: legal risk.
|
|
- **No prompt template / library**: 매 매 generation 의 reinvent.
|
|
|
|
## 🧪 검증 상태 (Validation)
|
|
- **정보 상태:** verified (concept-level).
|
|
- **출처 신뢰도:** B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
|
|
- **검토 이유:** Manual cleanup. 매 platform 의 매 6 month 의 evolution.
|
|
|
|
## 🧬 중복 검사 (Duplicate Check)
|
|
- **기존 유사 문서:** [[AI_Image_Generation_Workflow]] (related), [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] (related), [[Diffusion-Models]] (parent).
|
|
- **처리 방식:** KEEP (focused on platform / prompt comparison).
|
|
- **처리 이유:** 매 별 file 의 different angle.
|
|
|
|
## 🕓 변경 이력 (Changelog)
|
|
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|
|
|------|-----------|-----------|--------|
|
|
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
|
|
| 2026-05-09 | Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 | UPDATE | B |
|