d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15 KiB
15 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | inferred_by | tech_stack | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-ai-이미지-생성-ai-image-generation | AI Image Generation | 10_Wiki/Topics | verified | self |
|
none | B | 0.85 | conceptual |
|
2026-05-09 | pending | Claude Opus 4.7 (manual cleanup 2026-05-09) |
|
AI Image Generation
📌 한 줄 통찰 (The Karpathy Summary)
Diffusion model 의 text → image. 매 prompt 의 noise 의 progressive denoise. Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control) 의 매 specialty. Prompt + parameter + reference + negative 의 4 lever.
📖 구조화된 지식 (Synthesized Content)
핵심 architecture
Diffusion model
- Forward diffusion: image → noise (training).
- Reverse diffusion: noise → image (inference).
- Text encoder: prompt → embedding.
- Cross-attention: text 의 image 의 guide.
- Sampler (DDIM, DPM++, Euler): denoise step.
→ Stable Diffusion / Flux / Imagen 의 base.
GAN (legacy, less common now)
- StyleGAN.
- 매 photorealistic.
- Specific use case.
Autoregressive
- DALL-E 1 (legacy).
- VQ-VAE.
→ Modern = diffusion.
매 platform
Midjourney (예술 / cinematic)
- Subscription: $10-60 / month.
- Discord-based (legacy) → alpha web.
- 매 매개변수:
--ar,--v,--s,--c. - 매 reference:
--sref(style),--cref(character),--oref(omni). - V7 (2024-2025) 의 draft mode (10x faster).
- 매 commercial-friendly.
DALL-E 3 (자연어)
- OpenAI / ChatGPT integration.
- 매 GPT-4 의 prompt expansion.
- 매 정확 instruction following.
- 매 text rendering 강력.
- 매 negative prompt 약함.
Stable Diffusion (open / control)
- Open weights (CreativeML OpenRAIL-M).
- 매 local self-host.
- ComfyUI / Automatic1111 / Forge UI.
- LoRA / fine-tune / ControlNet.
- 매 weighted prompt:
(keyword:1.2). - 매 negative prompt 강력.
Flux (modern open, 2024+)
- Black Forest Labs (Stable Diffusion 의 originator).
- Flux.1 [dev] / [schnell] / [pro].
- 매 SDXL 보다 좋음 (2024 SoTA).
- 매 hand / text 의 정확 ↑.
Imagen / Veo (Google)
- 매 Imagen 3.
- Cloud API.
Adobe Firefly
- 매 commercial license-safe.
- Adobe Creative Cloud.
기타
- Ideogram (text in image).
- Recraft (vector).
- Krea (real-time).
- NovelAI (anime).
Prompt structure (universal)
4 layer
- Subject: "young woman, age 25, blue eyes".
- Medium / style: "oil painting, Renaissance style".
- Composition / environment: "close-up portrait, golden hour, mountain background".
- Technical: "85mm lens, shallow depth of field, --ar 3:2".
매 layer 의 specificity ↑ = quality ↑.
Parameters (Midjourney)
--ar 16:9: aspect ratio.--v 7: version.--s 250: stylize (artistic strength, 0-1000).--c 50: chaos (variety, 0-100).--sref [URL]: style reference.--cref [URL]: character reference.--oref [URL]: omni reference (V7).--no [thing]: simple negative.--niji: anime model.--draft: draft mode (10x faster).
Stable Diffusion 의 추가 control
Weighted prompt
(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]
→ 매 keyword 의 weight ↑/↓.
Negative prompt (강력)
ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality
→ 매 unwanted 의 explicit exclude.
CFG Scale (1-30)
- Classifier-Free Guidance.
- 매 prompt adherence ↑ vs creativity ↑.
- Default 7-12.
Sampling steps (10-50)
- 매 denoise 의 iteration.
- 매 quality ↑ + cost ↑.
- DPM++ 2M Karras = sweet (20-30 step).
Sampler choice
- Euler a, DPM++ 2M Karras, UniPC, ...
- 매 different style.
Advanced control
LoRA (Low-Rank Adaptation)
- 매 specific style / character 의 fine-tune.
- 매 small file (~100 MB).
- 매 multiple LoRA 의 stack.
ControlNet
- 매 pose / depth / edge 의 forced.
- Canny edge → image.
- OpenPose → image.
- Depth map → image.
IP-Adapter
- 매 image 의 reference style.
Inpainting
- 매 specific region 의 redo.
- 매 mask + prompt.
Outpainting / zoom out
- 매 canvas 의 extend.
Image-to-image (img2img)
Input image + prompt → modified image
→ 매 style transfer / variation.
Modern workflow patterns
Draft → upscale
- Draft mode: 매 dozen variant (cheap).
- Select best.
- Upscale + refine.
→ Midjourney / Flux 의 standard.
LoRA stacking
- Base model (SDXL / Flux).
- Style LoRA (e.g. anime, oil paint).
- Character LoRA (specific person).
- Concept LoRA (specific pose / object).
Img2img + ControlNet (precise)
- Sketch.
- ControlNet 의 line art guidance.
- Generate + iterate.
Inpainting workflow
- Generate base.
- Identify defect (extra finger, watermark).
- Mask + inpaint with negative.
Common defects + fix
| Defect | Fix |
|---|---|
| Extra fingers | Negative: "extra fingers, malformed hands" + LoRA |
| Asian-only faces | Specific ethnicity in prompt |
| Anime-only style | "photorealistic" + 비-anime model |
| Watermark | Negative: "watermark, signature, text" |
| Bad anatomy | Negative + ControlNet OpenPose |
| Blurry | Negative: "blurry" + steps ↑ |
| Wrong aspect | --ar 16:9 |
| Generic face | "specific name, distinct features" |
매 platform 의 differences
Negative prompt
- Stable Diffusion / Flux: explicit negative section, very strong.
- Midjourney:
--no [thing](limited). - DALL-E 3: weak (often makes the thing).
Prompt style
- DALL-E 3: natural language sentence.
- Midjourney: comma-separated keyword + parameter.
- Stable Diffusion: tag-based, weighted.
Photorealism
- Stable Diffusion / Flux: "photorealistic" works.
- Midjourney: implicit (cinematic feel).
- DALL-E 3: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).
매 commercial / IP
License
- Midjourney: commercial OK (paid).
- DALL-E 3: commercial OK.
- Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
- Adobe Firefly: commercial-safe (training data licensed).
매 lawsuit
- Getty vs Stable Diffusion (training data).
- Artists vs Midjourney (style mimicry).
Transparent disclosure
- 매 country 의 AI-generated 의 label requirement (EU AI Act).
💻 코드 패턴 (Code Patterns)
Stable Diffusion (Diffusers library)
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
).to("cuda")
# Generate
image = pipe(
prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
negative_prompt="blurry, deformed, watermark, signature",
num_inference_steps=30,
guidance_scale=7.5,
).images[0]
image.save("output.png")
Flux (modern)
from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A cat holding a sign that says 'Hello World'",
height=1024, width=1024,
guidance_scale=3.5,
num_inference_steps=50,
).images[0]
LoRA loading
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("base-model")
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")
# Stack LoRA
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe(prompt="...").images[0]
ControlNet (pose-controlled)
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
).to("cuda")
pose_image = Image.open("pose.png") # OpenPose extracted
image = pipe(
prompt="elegant woman, evening gown, studio lighting",
image=pose_image,
num_inference_steps=30,
).images[0]
Img2img
from diffusers import StableDiffusionImg2ImgPipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
init = Image.open("sketch.png")
image = pipe(
prompt="oil painting of mountain, sunset, masterpiece",
image=init,
strength=0.7, # 0 = no change, 1 = total
guidance_scale=7.5,
).images[0]
Inpainting
from diffusers import StableDiffusionInpaintPipeline
from PIL import Image
pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")
original = Image.open("photo.png")
mask = Image.open("mask.png") # white = redo, black = keep
image = pipe(
prompt="clean background, professional photo",
image=original,
mask_image=mask,
num_inference_steps=30,
).images[0]
Midjourney (Discord bot, no official API)
# Discord
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...
→ Discord webhook 의 monitoring, 또는 unofficial API.
DALL-E 3 (OpenAI API)
from openai import OpenAI
client = OpenAI()
response = client.images.generate(
model="dall-e-3",
prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
n=1,
size="1024x1024",
quality="hd",
style="natural", # or "vivid"
)
print(response.data[0].url)
Flux Replicate API
import replicate
output = replicate.run(
"black-forest-labs/flux-dev",
input={
"prompt": "A cat holding a sign...",
"guidance_scale": 3.5,
"num_inference_steps": 50,
}
)
print(output[0]) # URL
Batch generation (cost-efficient)
prompts = [f"variant {i}: cat with hat" for i in range(10)]
# Batch (faster than serial)
images = pipe(prompts, num_inference_steps=30).images
for i, img in enumerate(images):
img.save(f"batch_{i}.png")
ComfyUI workflow (visual node)
[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
↓
[LoRALoader] → [ControlNet]
→ 매 node 의 reorder. 매 user 의 own pipeline.
Custom prompt template
def build_prompt(subject, style, lighting, lens):
return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"
prompt = build_prompt(
subject="young woman, blue eyes",
style="oil painting, Renaissance",
lighting="golden hour, volumetric",
lens="85mm portrait lens, shallow depth of field"
)
Quality eval (CLIP score)
from transformers import CLIPProcessor, CLIPModel
import torch
from PIL import Image
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
image = Image.open("output.png")
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
print(f"CLIP score: {similarity:.3f}")
→ 매 prompt-image alignment 의 quantitative.
🤔 의사결정 기준 (Decision Criteria)
| 작업 | 추천 |
|---|---|
| Quick prototype | DALL-E 3 / Midjourney |
| Cinematic / artistic | Midjourney V7 |
| Natural language | DALL-E 3 |
| Open / control / privacy | Stable Diffusion / Flux |
| Photorealism | Flux / SDXL + LoRA |
| Anime / illustration | NovelAI / Niji |
| Commercial-safe | Adobe Firefly |
| Specific character | LoRA + reference |
| Pose-controlled | ControlNet |
| Text in image | Flux / Ideogram |
기본값: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).
⚠️ 모순 및 업데이트 (Contradictions & Updates)
- DALL-E 3 의 부정 prompt 약: "no X" 가 X 추가 가능. Positive 의 specify.
- Stable Diffusion 의 hardware 요구: 매 GPU 가 필요 (RTX 3090+ 추천).
- Midjourney 의 closed: 매 internal optimization 의 unknown.
- Training data 의 lawsuit: 매 model 의 future legal status 의 uncertain.
- 매 model 의 evolution: 매 6 month 의 best 가 다름.
- Flux 의 emerging: 매 modern SoTA 가 SDXL 의 surpass.
🔗 지식 연결 (Graph)
- 부모: Generative-AI · Diffusion-Models · Computer Vision
- 변형: Stable-Diffusion · Flux · Midjourney · DALL-E · Imagen
- 응용: ControlNet · LoRA · Inpainting · IP-Adapter
- 기법: Prompt_Engineering · Negative Prompt · CFG Scale · Sampling-Steps
- Tools: ComfyUI
🤖 LLM 활용 힌트 (How to Use This Knowledge)
언제 이 지식을 쓰는가:
- 매 art / design workflow 의 AI integration.
- 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
- 매 commercial project 의 license consideration.
- 매 prompt iteration 의 systematic.
- 매 self-host / privacy / cost 결정.
언제 쓰면 안 되는가:
- Specific art critique (artist-level).
- 매 country 의 specific copyright (lawyer).
- 매 deepfake / harmful generation (ethics).
- Photo retouching (Photoshop) 의 better.
❌ 안티패턴 (Anti-Patterns)
- Vague prompt ("nice picture"): generic.
- Long word salad: contradictory output.
- DALL-E 3 + negative prompt: 매 thing 의 add.
- Midjourney + Stable Diffusion 의 same syntax: parameter X.
- No iteration: 매 1 try 의 acceptance.
- Cloud generation + sensitive content: privacy.
- Commercial use + license unclear: legal risk.
- No prompt template / library: 매 매 generation 의 reinvent.
🧪 검증 상태 (Validation)
- 정보 상태: verified (concept-level).
- 출처 신뢰도: B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
- 검토 이유: Manual cleanup. 매 platform 의 매 6 month 의 evolution.
🧬 중복 검사 (Duplicate Check)
- 기존 유사 문서: AI_Image_Generation_Workflow (related), AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow) (related), Diffusion-Models (parent).
- 처리 방식: KEEP (focused on platform / prompt comparison).
- 처리 이유: 매 별 file 의 different angle.
🕓 변경 이력 (Changelog)
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|---|---|---|---|
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
| 2026-05-09 | Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 | UPDATE | B |