Files
2nd/10_Wiki/Topics/AI_and_ML/AI 이미지 생성 (AI Image Generation).md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

15 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit inferred_by tech_stack
wiki-2026-0508-ai-이미지-생성-ai-image-generation AI Image Generation 10_Wiki/Topics verified self
AI 이미지 생성
image gen
text-to-image
Midjourney
DALL-E
Stable Diffusion
Flux
Imagen
diffusion model
none B 0.85 conceptual
image-generation
diffusion-model
stable-diffusion
midjourney
dalle
flux
prompt-engineering
controlnet
lora
2026-05-09 pending Claude Opus 4.7 (manual cleanup 2026-05-09)
language framework
Python / API Diffusers / ComfyUI / Automatic1111 / Flux / SD WebUI

AI Image Generation

📌 한 줄 통찰 (The Karpathy Summary)

Diffusion model 의 text → image. 매 prompt 의 noise 의 progressive denoise. Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control) 의 매 specialty. Prompt + parameter + reference + negative 의 4 lever.

📖 구조화된 지식 (Synthesized Content)

핵심 architecture

Diffusion model

  1. Forward diffusion: image → noise (training).
  2. Reverse diffusion: noise → image (inference).
  3. Text encoder: prompt → embedding.
  4. Cross-attention: text 의 image 의 guide.
  5. Sampler (DDIM, DPM++, Euler): denoise step.

→ Stable Diffusion / Flux / Imagen 의 base.

GAN (legacy, less common now)

  • StyleGAN.
  • 매 photorealistic.
  • Specific use case.

Autoregressive

  • DALL-E 1 (legacy).
  • VQ-VAE.

→ Modern = diffusion.

매 platform

Midjourney (예술 / cinematic)

  • Subscription: $10-60 / month.
  • Discord-based (legacy) → alpha web.
  • 매 매개변수: --ar, --v, --s, --c.
  • 매 reference: --sref (style), --cref (character), --oref (omni).
  • V7 (2024-2025) 의 draft mode (10x faster).
  • 매 commercial-friendly.

DALL-E 3 (자연어)

  • OpenAI / ChatGPT integration.
  • 매 GPT-4 의 prompt expansion.
  • 매 정확 instruction following.
  • 매 text rendering 강력.
  • 매 negative prompt 약함.

Stable Diffusion (open / control)

  • Open weights (CreativeML OpenRAIL-M).
  • 매 local self-host.
  • ComfyUI / Automatic1111 / Forge UI.
  • LoRA / fine-tune / ControlNet.
  • 매 weighted prompt: (keyword:1.2).
  • 매 negative prompt 강력.

Flux (modern open, 2024+)

  • Black Forest Labs (Stable Diffusion 의 originator).
  • Flux.1 [dev] / [schnell] / [pro].
  • 매 SDXL 보다 좋음 (2024 SoTA).
  • 매 hand / text 의 정확 ↑.

Imagen / Veo (Google)

  • 매 Imagen 3.
  • Cloud API.

Adobe Firefly

  • 매 commercial license-safe.
  • Adobe Creative Cloud.

기타

  • Ideogram (text in image).
  • Recraft (vector).
  • Krea (real-time).
  • NovelAI (anime).

Prompt structure (universal)

4 layer

  1. Subject: "young woman, age 25, blue eyes".
  2. Medium / style: "oil painting, Renaissance style".
  3. Composition / environment: "close-up portrait, golden hour, mountain background".
  4. Technical: "85mm lens, shallow depth of field, --ar 3:2".

매 layer 의 specificity ↑ = quality ↑.

Parameters (Midjourney)

  • --ar 16:9: aspect ratio.
  • --v 7: version.
  • --s 250: stylize (artistic strength, 0-1000).
  • --c 50: chaos (variety, 0-100).
  • --sref [URL]: style reference.
  • --cref [URL]: character reference.
  • --oref [URL]: omni reference (V7).
  • --no [thing]: simple negative.
  • --niji: anime model.
  • --draft: draft mode (10x faster).

Stable Diffusion 의 추가 control

Weighted prompt

(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]

→ 매 keyword 의 weight ↑/↓.

Negative prompt (강력)

ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality

→ 매 unwanted 의 explicit exclude.

CFG Scale (1-30)

  • Classifier-Free Guidance.
  • 매 prompt adherence ↑ vs creativity ↑.
  • Default 7-12.

Sampling steps (10-50)

  • 매 denoise 의 iteration.
  • 매 quality ↑ + cost ↑.
  • DPM++ 2M Karras = sweet (20-30 step).

Sampler choice

  • Euler a, DPM++ 2M Karras, UniPC, ...
  • 매 different style.

Advanced control

LoRA (Low-Rank Adaptation)

  • 매 specific style / character 의 fine-tune.
  • 매 small file (~100 MB).
  • 매 multiple LoRA 의 stack.

ControlNet

  • 매 pose / depth / edge 의 forced.
  • Canny edge → image.
  • OpenPose → image.
  • Depth map → image.

IP-Adapter

  • 매 image 의 reference style.

Inpainting

  • 매 specific region 의 redo.
  • 매 mask + prompt.

Outpainting / zoom out

  • 매 canvas 의 extend.

Image-to-image (img2img)

Input image + prompt → modified image

→ 매 style transfer / variation.

Modern workflow patterns

Draft → upscale

  1. Draft mode: 매 dozen variant (cheap).
  2. Select best.
  3. Upscale + refine.

→ Midjourney / Flux 의 standard.

LoRA stacking

  1. Base model (SDXL / Flux).
  2. Style LoRA (e.g. anime, oil paint).
  3. Character LoRA (specific person).
  4. Concept LoRA (specific pose / object).

Img2img + ControlNet (precise)

  1. Sketch.
  2. ControlNet 의 line art guidance.
  3. Generate + iterate.

Inpainting workflow

  1. Generate base.
  2. Identify defect (extra finger, watermark).
  3. Mask + inpaint with negative.

Common defects + fix

Defect Fix
Extra fingers Negative: "extra fingers, malformed hands" + LoRA
Asian-only faces Specific ethnicity in prompt
Anime-only style "photorealistic" + 비-anime model
Watermark Negative: "watermark, signature, text"
Bad anatomy Negative + ControlNet OpenPose
Blurry Negative: "blurry" + steps ↑
Wrong aspect --ar 16:9
Generic face "specific name, distinct features"

매 platform 의 differences

Negative prompt

  • Stable Diffusion / Flux: explicit negative section, very strong.
  • Midjourney: --no [thing] (limited).
  • DALL-E 3: weak (often makes the thing).

Prompt style

  • DALL-E 3: natural language sentence.
  • Midjourney: comma-separated keyword + parameter.
  • Stable Diffusion: tag-based, weighted.

Photorealism

  • Stable Diffusion / Flux: "photorealistic" works.
  • Midjourney: implicit (cinematic feel).
  • DALL-E 3: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).

매 commercial / IP

License

  • Midjourney: commercial OK (paid).
  • DALL-E 3: commercial OK.
  • Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
  • Adobe Firefly: commercial-safe (training data licensed).

매 lawsuit

  • Getty vs Stable Diffusion (training data).
  • Artists vs Midjourney (style mimicry).

Transparent disclosure

  • 매 country 의 AI-generated 의 label requirement (EU AI Act).

💻 코드 패턴 (Code Patterns)

Stable Diffusion (Diffusers library)

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
).to("cuda")

# Generate
image = pipe(
    prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
    negative_prompt="blurry, deformed, watermark, signature",
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]

image.save("output.png")

Flux (modern)

from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A cat holding a sign that says 'Hello World'",
    height=1024, width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
).images[0]

LoRA loading

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("base-model")
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")

# Stack LoRA
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe(prompt="...").images[0]

ControlNet (pose-controlled)

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
).to("cuda")

pose_image = Image.open("pose.png")   # OpenPose extracted

image = pipe(
    prompt="elegant woman, evening gown, studio lighting",
    image=pose_image,
    num_inference_steps=30,
).images[0]

Img2img

from diffusers import StableDiffusionImg2ImgPipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
init = Image.open("sketch.png")

image = pipe(
    prompt="oil painting of mountain, sunset, masterpiece",
    image=init,
    strength=0.7,   # 0 = no change, 1 = total
    guidance_scale=7.5,
).images[0]

Inpainting

from diffusers import StableDiffusionInpaintPipeline
from PIL import Image

pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")

original = Image.open("photo.png")
mask = Image.open("mask.png")   # white = redo, black = keep

image = pipe(
    prompt="clean background, professional photo",
    image=original,
    mask_image=mask,
    num_inference_steps=30,
).images[0]

Midjourney (Discord bot, no official API)

# Discord
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...

→ Discord webhook 의 monitoring, 또는 unofficial API.

DALL-E 3 (OpenAI API)

from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
    n=1,
    size="1024x1024",
    quality="hd",
    style="natural",   # or "vivid"
)

print(response.data[0].url)

Flux Replicate API

import replicate

output = replicate.run(
    "black-forest-labs/flux-dev",
    input={
        "prompt": "A cat holding a sign...",
        "guidance_scale": 3.5,
        "num_inference_steps": 50,
    }
)

print(output[0])  # URL

Batch generation (cost-efficient)

prompts = [f"variant {i}: cat with hat" for i in range(10)]

# Batch (faster than serial)
images = pipe(prompts, num_inference_steps=30).images
for i, img in enumerate(images):
    img.save(f"batch_{i}.png")

ComfyUI workflow (visual node)

[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
                                ↓
                        [LoRALoader] → [ControlNet]

→ 매 node 의 reorder. 매 user 의 own pipeline.

Custom prompt template

def build_prompt(subject, style, lighting, lens):
    return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"

prompt = build_prompt(
    subject="young woman, blue eyes",
    style="oil painting, Renaissance",
    lighting="golden hour, volumetric",
    lens="85mm portrait lens, shallow depth of field"
)

Quality eval (CLIP score)

from transformers import CLIPProcessor, CLIPModel
import torch
from PIL import Image

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

image = Image.open("output.png")
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)

similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
print(f"CLIP score: {similarity:.3f}")

→ 매 prompt-image alignment 의 quantitative.

🤔 의사결정 기준 (Decision Criteria)

작업 추천
Quick prototype DALL-E 3 / Midjourney
Cinematic / artistic Midjourney V7
Natural language DALL-E 3
Open / control / privacy Stable Diffusion / Flux
Photorealism Flux / SDXL + LoRA
Anime / illustration NovelAI / Niji
Commercial-safe Adobe Firefly
Specific character LoRA + reference
Pose-controlled ControlNet
Text in image Flux / Ideogram

기본값: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).

⚠️ 모순 및 업데이트 (Contradictions & Updates)

  • DALL-E 3 의 부정 prompt 약: "no X" 가 X 추가 가능. Positive 의 specify.
  • Stable Diffusion 의 hardware 요구: 매 GPU 가 필요 (RTX 3090+ 추천).
  • Midjourney 의 closed: 매 internal optimization 의 unknown.
  • Training data 의 lawsuit: 매 model 의 future legal status 의 uncertain.
  • 매 model 의 evolution: 매 6 month 의 best 가 다름.
  • Flux 의 emerging: 매 modern SoTA 가 SDXL 의 surpass.

🔗 지식 연결 (Graph)

🤖 LLM 활용 힌트 (How to Use This Knowledge)

언제 이 지식을 쓰는가:

  • 매 art / design workflow 의 AI integration.
  • 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
  • 매 commercial project 의 license consideration.
  • 매 prompt iteration 의 systematic.
  • 매 self-host / privacy / cost 결정.

언제 쓰면 안 되는가:

  • Specific art critique (artist-level).
  • 매 country 의 specific copyright (lawyer).
  • 매 deepfake / harmful generation (ethics).
  • Photo retouching (Photoshop) 의 better.

안티패턴 (Anti-Patterns)

  • Vague prompt ("nice picture"): generic.
  • Long word salad: contradictory output.
  • DALL-E 3 + negative prompt: 매 thing 의 add.
  • Midjourney + Stable Diffusion 의 same syntax: parameter X.
  • No iteration: 매 1 try 의 acceptance.
  • Cloud generation + sensitive content: privacy.
  • Commercial use + license unclear: legal risk.
  • No prompt template / library: 매 매 generation 의 reinvent.

🧪 검증 상태 (Validation)

  • 정보 상태: verified (concept-level).
  • 출처 신뢰도: B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
  • 검토 이유: Manual cleanup. 매 platform 의 매 6 month 의 evolution.

🧬 중복 검사 (Duplicate Check)

🕓 변경 이력 (Changelog)

날짜 변경 내용 처리 방식 신뢰도
2026-05-08 P-Reinforce Phase 1 정규화 UPDATE A
2026-05-09 Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 UPDATE B