Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

15 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack

title

AI Image Generation

📌 한 줄 통찰 (The Karpathy Summary)

Diffusion model 의 text → image. 매 prompt 의 noise 의 progressive denoise. Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control) 의 매 specialty. Prompt + parameter + reference + negative 의 4 lever.

📖 구조화된 지식 (Synthesized Content)

핵심 architecture

Diffusion model

Forward diffusion: image → noise (training).
Reverse diffusion: noise → image (inference).
Text encoder: prompt → embedding.
Cross-attention: text 의 image 의 guide.
Sampler (DDIM, DPM++, Euler): denoise step.

→ Stable Diffusion / Flux / Imagen 의 base.

GAN (legacy, less common now)

StyleGAN.
매 photorealistic.
Specific use case.

Autoregressive

DALL-E 1 (legacy).
VQ-VAE.

→ Modern = diffusion.

매 platform

Midjourney (예술 / cinematic)

Subscription: $10-60 / month.
Discord-based (legacy) → alpha web.
매 매개변수: --ar, --v, --s, --c.
매 reference: --sref (style), --cref (character), --oref (omni).
V7 (2024-2025) 의 draft mode (10x faster).
매 commercial-friendly.

DALL-E 3 (자연어)

OpenAI / ChatGPT integration.
매 GPT-4 의 prompt expansion.
매 정확 instruction following.
매 text rendering 강력.
매 negative prompt 약함.

Stable Diffusion (open / control)

Open weights (CreativeML OpenRAIL-M).
매 local self-host.
ComfyUI / Automatic1111 / Forge UI.
LoRA / fine-tune / ControlNet.
매 weighted prompt: (keyword:1.2).
매 negative prompt 강력.

Flux (modern open, 2024+)

Black Forest Labs (Stable Diffusion 의 originator).
Flux.1 [dev] / [schnell] / [pro].
매 SDXL 보다 좋음 (2024 SoTA).
매 hand / text 의 정확 ↑.

Imagen / Veo (Google)

매 Imagen 3.
Cloud API.

Adobe Firefly

매 commercial license-safe.
Adobe Creative Cloud.

기타

Ideogram (text in image).
Recraft (vector).
Krea (real-time).
NovelAI (anime).

Prompt structure (universal)

4 layer

Subject: "young woman, age 25, blue eyes".
Medium / style: "oil painting, Renaissance style".
Composition / environment: "close-up portrait, golden hour, mountain background".
Technical: "85mm lens, shallow depth of field, --ar 3:2".

매 layer 의 specificity ↑ = quality ↑.

Parameters (Midjourney)

--ar 16:9: aspect ratio.
--v 7: version.
--s 250: stylize (artistic strength, 0-1000).
--c 50: chaos (variety, 0-100).
--sref [URL]: style reference.
--cref [URL]: character reference.
--oref [URL]: omni reference (V7).
--no [thing]: simple negative.
--niji: anime model.
--draft: draft mode (10x faster).

Stable Diffusion 의 추가 control

Weighted prompt

(masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3]

→ 매 keyword 의 weight ↑/↓.

Negative prompt (강력)

ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality

→ 매 unwanted 의 explicit exclude.

CFG Scale (1-30)

Classifier-Free Guidance.
매 prompt adherence ↑ vs creativity ↑.
Default 7-12.

Sampling steps (10-50)

매 denoise 의 iteration.
매 quality ↑ + cost ↑.
DPM++ 2M Karras = sweet (20-30 step).

Sampler choice

Euler a, DPM++ 2M Karras, UniPC, ...
매 different style.

Advanced control

LoRA (Low-Rank Adaptation)

매 specific style / character 의 fine-tune.
매 small file (~100 MB).
매 multiple LoRA 의 stack.

ControlNet

매 pose / depth / edge 의 forced.
Canny edge → image.
OpenPose → image.
Depth map → image.

IP-Adapter

매 image 의 reference style.

Inpainting

매 specific region 의 redo.
매 mask + prompt.

Outpainting / zoom out

매 canvas 의 extend.

Image-to-image (img2img)

Input image + prompt → modified image

→ 매 style transfer / variation.

Modern workflow patterns

Draft → upscale

Draft mode: 매 dozen variant (cheap).
Select best.
Upscale + refine.

→ Midjourney / Flux 의 standard.

LoRA stacking

Base model (SDXL / Flux).
Style LoRA (e.g. anime, oil paint).
Character LoRA (specific person).
Concept LoRA (specific pose / object).

Img2img + ControlNet (precise)

Sketch.
ControlNet 의 line art guidance.
Generate + iterate.

Inpainting workflow

Generate base.
Identify defect (extra finger, watermark).
Mask + inpaint with negative.

Common defects + fix

Defect	Fix
Extra fingers	Negative: "extra fingers, malformed hands" + LoRA
Asian-only faces	Specific ethnicity in prompt
Anime-only style	"photorealistic" + 비-anime model
Watermark	Negative: "watermark, signature, text"
Bad anatomy	Negative + ControlNet OpenPose
Blurry	Negative: "blurry" + steps ↑
Wrong aspect	`--ar 16:9`
Generic face	"specific name, distinct features"

매 platform 의 differences

Negative prompt

Stable Diffusion / Flux: explicit negative section, very strong.
Midjourney: --no [thing] (limited).
DALL-E 3: weak (often makes the thing).

Prompt style

DALL-E 3: natural language sentence.
Midjourney: comma-separated keyword + parameter.
Stable Diffusion: tag-based, weighted.

Photorealism

Stable Diffusion / Flux: "photorealistic" works.
Midjourney: implicit (cinematic feel).
DALL-E 3: "photo style" + lens info > "photorealistic" (which 의 airbrush feel).

매 commercial / IP

License

Midjourney: commercial OK (paid).
DALL-E 3: commercial OK.
Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK).
Adobe Firefly: commercial-safe (training data licensed).

매 lawsuit

Getty vs Stable Diffusion (training data).
Artists vs Midjourney (style mimicry).

Transparent disclosure

매 country 의 AI-generated 의 label requirement (EU AI Act).

💻 코드 패턴 (Code Patterns)

Stable Diffusion (Diffusers library)

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
).to("cuda")

# Generate
image = pipe(
    prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field",
    negative_prompt="blurry, deformed, watermark, signature",
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]

image.save("output.png")

Flux (modern)

from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A cat holding a sign that says 'Hello World'",
    height=1024, width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
).images[0]

LoRA loading

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("base-model")
pipe.load_lora_weights("lora-style.safetensors", adapter_name="style")
pipe.load_lora_weights("lora-character.safetensors", adapter_name="character")

# Stack LoRA
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe(prompt="...").images[0]

ControlNet (pose-controlled)

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
).to("cuda")

pose_image = Image.open("pose.png")   # OpenPose extracted

image = pipe(
    prompt="elegant woman, evening gown, studio lighting",
    image=pose_image,
    num_inference_steps=30,
).images[0]

Img2img

from diffusers import StableDiffusionImg2ImgPipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model")
init = Image.open("sketch.png")

image = pipe(
    prompt="oil painting of mountain, sunset, masterpiece",
    image=init,
    strength=0.7,   # 0 = no change, 1 = total
    guidance_scale=7.5,
).images[0]

Inpainting

from diffusers import StableDiffusionInpaintPipeline
from PIL import Image

pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model")

original = Image.open("photo.png")
mask = Image.open("mask.png")   # white = redo, black = keep

image = pipe(
    prompt="clean background, professional photo",
    image=original,
    mask_image=mask,
    num_inference_steps=30,
).images[0]

Midjourney (Discord bot, no official API)

# Discord
/imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://...

→ Discord webhook 의 monitoring, 또는 unofficial API.

DALL-E 3 (OpenAI API)

from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens",
    n=1,
    size="1024x1024",
    quality="hd",
    style="natural",   # or "vivid"
)

print(response.data[0].url)

Flux Replicate API

import replicate

output = replicate.run(
    "black-forest-labs/flux-dev",
    input={
        "prompt": "A cat holding a sign...",
        "guidance_scale": 3.5,
        "num_inference_steps": 50,
    }
)

print(output[0])  # URL

Batch generation (cost-efficient)

prompts = [f"variant {i}: cat with hat" for i in range(10)]

# Batch (faster than serial)
images = pipe(prompts, num_inference_steps=30).images
for i, img in enumerate(images):
    img.save(f"batch_{i}.png")

ComfyUI workflow (visual node)

[CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage]
                                ↓
                        [LoRALoader] → [ControlNet]

→ 매 node 의 reorder. 매 user 의 own pipeline.

Custom prompt template

def build_prompt(subject, style, lighting, lens):
    return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality"

prompt = build_prompt(
    subject="young woman, blue eyes",
    style="oil painting, Renaissance",
    lighting="golden hour, volumetric",
    lens="85mm portrait lens, shallow depth of field"
)

Quality eval (CLIP score)

from transformers import CLIPProcessor, CLIPModel
import torch
from PIL import Image

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

image = Image.open("output.png")
inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)

similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item()
print(f"CLIP score: {similarity:.3f}")

→ 매 prompt-image alignment 의 quantitative.

🤔 의사결정 기준 (Decision Criteria)

작업	추천
Quick prototype	DALL-E 3 / Midjourney
Cinematic / artistic	Midjourney V7
Natural language	DALL-E 3
Open / control / privacy	Stable Diffusion / Flux
Photorealism	Flux / SDXL + LoRA
Anime / illustration	NovelAI / Niji
Commercial-safe	Adobe Firefly
Specific character	LoRA + reference
Pose-controlled	ControlNet
Text in image	Flux / Ideogram

기본값: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어).

⚠️ 모순 및 업데이트 (Contradictions & Updates)

DALL-E 3 의 부정 prompt 약: "no X" 가 X 추가 가능. Positive 의 specify.
Stable Diffusion 의 hardware 요구: 매 GPU 가 필요 (RTX 3090+ 추천).
Midjourney 의 closed: 매 internal optimization 의 unknown.
Training data 의 lawsuit: 매 model 의 future legal status 의 uncertain.
매 model 의 evolution: 매 6 month 의 best 가 다름.
Flux 의 emerging: 매 modern SoTA 가 SDXL 의 surpass.

🔗 지식 연결 (Graph)

부모: Generative-AI · Diffusion-Models · Computer Vision
변형: Stable-Diffusion · Flux · Midjourney · DALL-E · Imagen
응용: ControlNet · LoRA · Inpainting · IP-Adapter
기법: Prompt_Engineering · Negative Prompt · CFG Scale · Sampling-Steps
Tools: ComfyUI

🤖 LLM 활용 힌트 (How to Use This Knowledge)

언제 이 지식을 쓰는가:

매 art / design workflow 의 AI integration.
매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택.
매 commercial project 의 license consideration.
매 prompt iteration 의 systematic.
매 self-host / privacy / cost 결정.

언제 쓰면 안 되는가:

Specific art critique (artist-level).
매 country 의 specific copyright (lawyer).
매 deepfake / harmful generation (ethics).
Photo retouching (Photoshop) 의 better.

❌ 안티패턴 (Anti-Patterns)

Vague prompt ("nice picture"): generic.
Long word salad: contradictory output.
DALL-E 3 + negative prompt: 매 thing 의 add.
Midjourney + Stable Diffusion 의 same syntax: parameter X.
No iteration: 매 1 try 의 acceptance.
Cloud generation + sensitive content: privacy.
Commercial use + license unclear: legal risk.
No prompt template / library: 매 매 generation 의 reinvent.

🧪 검증 상태 (Validation)

정보 상태: verified (concept-level).
출처 신뢰도: B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers).
검토 이유: Manual cleanup. 매 platform 의 매 6 month 의 evolution.

🧬 중복 검사 (Duplicate Check)

기존 유사 문서: AI_Image_Generation_Workflow (related), AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow) (related), Diffusion-Models (parent).
처리 방식: KEEP (focused on platform / prompt comparison).
처리 이유: 매 별 file 의 different angle.

🕓 변경 이력 (Changelog)

날짜	변경 내용	처리 방식	신뢰도
2026-05-08	P-Reinforce Phase 1 정규화	UPDATE	A
2026-05-09	Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가	UPDATE	B

15 KiB Raw Blame History

AI Image Generation

📌 한 줄 통찰 (The Karpathy Summary)

📖 구조화된 지식 (Synthesized Content)

핵심 architecture

Diffusion model

GAN (legacy, less common now)

Autoregressive

매 platform

Midjourney (예술 / cinematic)

DALL-E 3 (자연어)

Stable Diffusion (open / control)

Flux (modern open, 2024+)

Imagen / Veo (Google)

Adobe Firefly

기타

Prompt structure (universal)

4 layer

매 layer 의 specificity ↑ = quality ↑.

Parameters (Midjourney)

Stable Diffusion 의 추가 control

Weighted prompt

Negative prompt (강력)

CFG Scale (1-30)

Sampling steps (10-50)

Sampler choice

Advanced control

LoRA (Low-Rank Adaptation)

ControlNet

IP-Adapter

Inpainting

Outpainting / zoom out

Image-to-image (img2img)

Modern workflow patterns

Draft → upscale

LoRA stacking

Img2img + ControlNet (precise)

Inpainting workflow

Common defects + fix

매 platform 의 differences

Negative prompt

Prompt style

Photorealism

매 commercial / IP

License

매 lawsuit

Transparent disclosure

💻 코드 패턴 (Code Patterns)

Stable Diffusion (Diffusers library)

Flux (modern)

LoRA loading

ControlNet (pose-controlled)

Img2img

Inpainting

Midjourney (Discord bot, no official API)

DALL-E 3 (OpenAI API)

Flux Replicate API

Batch generation (cost-efficient)

ComfyUI workflow (visual node)

Custom prompt template

Quality eval (CLIP score)

🤔 의사결정 기준 (Decision Criteria)

⚠️ 모순 및 업데이트 (Contradictions & Updates)

🔗 지식 연결 (Graph)

🤖 LLM 활용 힌트 (How to Use This Knowledge)

❌ 안티패턴 (Anti-Patterns)

🧪 검증 상태 (Validation)

🧬 중복 검사 (Duplicate Check)

🕓 변경 이력 (Changelog)

15 KiB

Raw Blame History