--- id: wiki-2026-0508-ai-이미지-생성-ai-image-generation title: AI Image Generation category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AI 이미지 생성, image gen, text-to-image, Midjourney, DALL-E, Stable Diffusion, Flux, Imagen, diffusion model] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: conceptual tags: [image-generation, diffusion-model, stable-diffusion, midjourney, dalle, flux, prompt-engineering, controlnet, lora] raw_sources: [] last_reinforced: 2026-05-09 github_commit: pending inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09) tech_stack: language: Python / API framework: Diffusers / ComfyUI / Automatic1111 / Flux / SD WebUI --- # AI Image Generation ## 📌 한 줄 통찰 (The Karpathy Summary) > **Diffusion model 의 text → image**. 매 prompt 의 noise 의 progressive denoise. **Midjourney (예술), DALL-E (자연어), Stable Diffusion / Flux (open + control)** 의 매 specialty. **Prompt + parameter + reference + negative** 의 4 lever. ## 📖 구조화된 지식 (Synthesized Content) ### 핵심 architecture #### Diffusion model 1. **Forward diffusion**: image → noise (training). 2. **Reverse diffusion**: noise → image (inference). 3. **Text encoder**: prompt → embedding. 4. **Cross-attention**: text 의 image 의 guide. 5. **Sampler** (DDIM, DPM++, Euler): denoise step. → Stable Diffusion / Flux / Imagen 의 base. #### GAN (legacy, less common now) - StyleGAN. - 매 photorealistic. - Specific use case. #### Autoregressive - DALL-E 1 (legacy). - VQ-VAE. → Modern = diffusion. ### 매 platform #### Midjourney (예술 / cinematic) - **Subscription**: $10-60 / month. - **Discord-based** (legacy) → **alpha web**. - 매 매개변수: `--ar`, `--v`, `--s`, `--c`. - 매 reference: `--sref` (style), `--cref` (character), `--oref` (omni). - V7 (2024-2025) 의 draft mode (10x faster). - 매 commercial-friendly. #### DALL-E 3 (자연어) - **OpenAI** / ChatGPT integration. - 매 GPT-4 의 prompt expansion. - 매 정확 instruction following. - 매 text rendering 강력. - 매 negative prompt 약함. #### Stable Diffusion (open / control) - **Open weights** (CreativeML OpenRAIL-M). - 매 local self-host. - ComfyUI / Automatic1111 / Forge UI. - LoRA / fine-tune / ControlNet. - 매 weighted prompt: `(keyword:1.2)`. - 매 negative prompt 강력. #### Flux (modern open, 2024+) - **Black Forest Labs** (Stable Diffusion 의 originator). - Flux.1 [dev] / [schnell] / [pro]. - 매 SDXL 보다 좋음 (2024 SoTA). - 매 hand / text 의 정확 ↑. #### Imagen / Veo (Google) - 매 Imagen 3. - Cloud API. #### Adobe Firefly - 매 commercial license-safe. - Adobe Creative Cloud. #### 기타 - Ideogram (text in image). - Recraft (vector). - Krea (real-time). - NovelAI (anime). ### Prompt structure (universal) #### 4 layer 1. **Subject**: "young woman, age 25, blue eyes". 2. **Medium / style**: "oil painting, Renaissance style". 3. **Composition / environment**: "close-up portrait, golden hour, mountain background". 4. **Technical**: "85mm lens, shallow depth of field, --ar 3:2". #### 매 layer 의 specificity ↑ = quality ↑. ### Parameters (Midjourney) - `--ar 16:9`: aspect ratio. - `--v 7`: version. - `--s 250`: stylize (artistic strength, 0-1000). - `--c 50`: chaos (variety, 0-100). - `--sref [URL]`: style reference. - `--cref [URL]`: character reference. - `--oref [URL]`: omni reference (V7). - `--no [thing]`: simple negative. - `--niji`: anime model. - `--draft`: draft mode (10x faster). ### Stable Diffusion 의 추가 control #### Weighted prompt ``` (masterpiece:1.3), (8k:1.2), portrait, [low quality:0.3] ``` → 매 keyword 의 weight ↑/↓. #### Negative prompt (강력) ``` ugly, deformed, blurry, bad anatomy, extra fingers, watermark, signature, low quality ``` → 매 unwanted 의 explicit exclude. #### CFG Scale (1-30) - Classifier-Free Guidance. - 매 prompt adherence ↑ vs creativity ↑. - Default 7-12. #### Sampling steps (10-50) - 매 denoise 의 iteration. - 매 quality ↑ + cost ↑. - DPM++ 2M Karras = sweet (20-30 step). #### Sampler choice - Euler a, DPM++ 2M Karras, UniPC, ... - 매 different style. ### Advanced control #### LoRA (Low-Rank Adaptation) - 매 specific style / character 의 fine-tune. - 매 small file (~100 MB). - 매 multiple LoRA 의 stack. #### ControlNet - 매 pose / depth / edge 의 forced. - Canny edge → image. - OpenPose → image. - Depth map → image. #### IP-Adapter - 매 image 의 reference style. #### Inpainting - 매 specific region 의 redo. - 매 mask + prompt. #### Outpainting / zoom out - 매 canvas 의 extend. ### Image-to-image (img2img) ``` Input image + prompt → modified image ``` → 매 style transfer / variation. ### Modern workflow patterns #### Draft → upscale 1. **Draft mode**: 매 dozen variant (cheap). 2. **Select best**. 3. **Upscale + refine**. → Midjourney / Flux 의 standard. #### LoRA stacking 1. **Base model** (SDXL / Flux). 2. **Style LoRA** (e.g. anime, oil paint). 3. **Character LoRA** (specific person). 4. **Concept LoRA** (specific pose / object). #### Img2img + ControlNet (precise) 1. **Sketch**. 2. **ControlNet 의 line art guidance**. 3. **Generate + iterate**. #### Inpainting workflow 1. **Generate base**. 2. **Identify defect** (extra finger, watermark). 3. **Mask + inpaint with negative**. ### Common defects + fix | Defect | Fix | |---|---| | Extra fingers | Negative: "extra fingers, malformed hands" + LoRA | | Asian-only faces | Specific ethnicity in prompt | | Anime-only style | "photorealistic" + 비-anime model | | Watermark | Negative: "watermark, signature, text" | | Bad anatomy | Negative + ControlNet OpenPose | | Blurry | Negative: "blurry" + steps ↑ | | Wrong aspect | `--ar 16:9` | | Generic face | "specific name, distinct features" | ### 매 platform 의 differences #### Negative prompt - **Stable Diffusion / Flux**: explicit negative section, very strong. - **Midjourney**: `--no [thing]` (limited). - **DALL-E 3**: weak (often makes the thing). #### Prompt style - **DALL-E 3**: natural language sentence. - **Midjourney**: comma-separated keyword + parameter. - **Stable Diffusion**: tag-based, weighted. #### Photorealism - **Stable Diffusion / Flux**: "photorealistic" works. - **Midjourney**: implicit (cinematic feel). - **DALL-E 3**: "photo style" + lens info > "photorealistic" (which 의 airbrush feel). ### 매 commercial / IP #### License - Midjourney: commercial OK (paid). - DALL-E 3: commercial OK. - Stable Diffusion: open (CreativeML OpenRAIL-M, commercial OK). - Adobe Firefly: commercial-safe (training data licensed). #### 매 lawsuit - Getty vs Stable Diffusion (training data). - Artists vs Midjourney (style mimicry). #### Transparent disclosure - 매 country 의 AI-generated 의 label requirement (EU AI Act). ## 💻 코드 패턴 (Code Patterns) ### Stable Diffusion (Diffusers library) ```python from diffusers import StableDiffusionPipeline import torch pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, ).to("cuda") # Generate image = pipe( prompt="(masterpiece:1.2), portrait of a young woman, blue eyes, golden hour, 85mm lens, shallow depth of field", negative_prompt="blurry, deformed, watermark, signature", num_inference_steps=30, guidance_scale=7.5, ).images[0] image.save("output.png") ``` ### Flux (modern) ```python from diffusers import FluxPipeline import torch pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, ).to("cuda") image = pipe( prompt="A cat holding a sign that says 'Hello World'", height=1024, width=1024, guidance_scale=3.5, num_inference_steps=50, ).images[0] ``` ### LoRA loading ```python from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("base-model") pipe.load_lora_weights("lora-style.safetensors", adapter_name="style") pipe.load_lora_weights("lora-character.safetensors", adapter_name="character") # Stack LoRA pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5]) image = pipe(prompt="...").images[0] ``` ### ControlNet (pose-controlled) ```python from diffusers import StableDiffusionControlNetPipeline, ControlNetModel from PIL import Image controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose") pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, ).to("cuda") pose_image = Image.open("pose.png") # OpenPose extracted image = pipe( prompt="elegant woman, evening gown, studio lighting", image=pose_image, num_inference_steps=30, ).images[0] ``` ### Img2img ```python from diffusers import StableDiffusionImg2ImgPipeline pipe = StableDiffusionImg2ImgPipeline.from_pretrained("base-model") init = Image.open("sketch.png") image = pipe( prompt="oil painting of mountain, sunset, masterpiece", image=init, strength=0.7, # 0 = no change, 1 = total guidance_scale=7.5, ).images[0] ``` ### Inpainting ```python from diffusers import StableDiffusionInpaintPipeline from PIL import Image pipe = StableDiffusionInpaintPipeline.from_pretrained("inpainting-model") original = Image.open("photo.png") mask = Image.open("mask.png") # white = redo, black = keep image = pipe( prompt="clean background, professional photo", image=original, mask_image=mask, num_inference_steps=30, ).images[0] ``` ### Midjourney (Discord bot, no official API) ``` # Discord /imagine prompt: portrait of a knight, fantasy, oil painting, --ar 3:2 --v 7 --s 500 --sref https://... ``` → Discord webhook 의 monitoring, 또는 unofficial API. ### DALL-E 3 (OpenAI API) ```python from openai import OpenAI client = OpenAI() response = client.images.generate( model="dall-e-3", prompt="A cute corgi puppy in a sunny park, professional photo, 85mm lens", n=1, size="1024x1024", quality="hd", style="natural", # or "vivid" ) print(response.data[0].url) ``` ### Flux Replicate API ```python import replicate output = replicate.run( "black-forest-labs/flux-dev", input={ "prompt": "A cat holding a sign...", "guidance_scale": 3.5, "num_inference_steps": 50, } ) print(output[0]) # URL ``` ### Batch generation (cost-efficient) ```python prompts = [f"variant {i}: cat with hat" for i in range(10)] # Batch (faster than serial) images = pipe(prompts, num_inference_steps=30).images for i, img in enumerate(images): img.save(f"batch_{i}.png") ``` ### ComfyUI workflow (visual node) ``` [CheckpointLoader] → [PromptText] → [Sampler] → [VAEDecode] → [SaveImage] ↓ [LoRALoader] → [ControlNet] ``` → 매 node 의 reorder. 매 user 의 own pipeline. ### Custom prompt template ```python def build_prompt(subject, style, lighting, lens): return f"({style}:1.2), {subject}, {lighting}, {lens}, masterpiece, best quality" prompt = build_prompt( subject="young woman, blue eyes", style="oil painting, Renaissance", lighting="golden hour, volumetric", lens="85mm portrait lens, shallow depth of field" ) ``` ### Quality eval (CLIP score) ```python from transformers import CLIPProcessor, CLIPModel import torch from PIL import Image processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") image = Image.open("output.png") inputs = processor(text=[prompt], images=image, return_tensors="pt", padding=True) outputs = model(**inputs) similarity = outputs.logits_per_image.softmax(dim=1)[0][0].item() print(f"CLIP score: {similarity:.3f}") ``` → 매 prompt-image alignment 의 quantitative. ## 🤔 의사결정 기준 (Decision Criteria) | 작업 | 추천 | |---|---| | Quick prototype | DALL-E 3 / Midjourney | | Cinematic / artistic | Midjourney V7 | | Natural language | DALL-E 3 | | Open / control / privacy | Stable Diffusion / Flux | | Photorealism | Flux / SDXL + LoRA | | Anime / illustration | NovelAI / Niji | | Commercial-safe | Adobe Firefly | | Specific character | LoRA + reference | | Pose-controlled | ControlNet | | Text in image | Flux / Ideogram | **기본값**: Midjourney (예술), Flux (open + control), DALL-E 3 (자연어). ## ⚠️ 모순 및 업데이트 (Contradictions & Updates) - **DALL-E 3 의 부정 prompt 약**: "no X" 가 X 추가 가능. Positive 의 specify. - **Stable Diffusion 의 hardware 요구**: 매 GPU 가 필요 (RTX 3090+ 추천). - **Midjourney 의 closed**: 매 internal optimization 의 unknown. - **Training data 의 lawsuit**: 매 model 의 future legal status 의 uncertain. - **매 model 의 evolution**: 매 6 month 의 best 가 다름. - **Flux 의 emerging**: 매 modern SoTA 가 SDXL 의 surpass. ## 🔗 지식 연결 (Graph) - 부모: [[Generative-AI]] · [[Diffusion-Models]] · [[Computer-Vision]] - 변형: [[Stable-Diffusion]] · [[Flux]] · [[Midjourney]] · [[DALL-E]] · [[Imagen]] - 응용: [[ControlNet]] · [[LoRA]] · [[Inpainting]] · [[Img2Img]] · [[IP-Adapter]] - 기법: [[Prompt-Engineering]] · [[Negative-Prompt]] · [[CFG-Scale]] · [[Sampling-Steps]] - 응용 분야: [[AI-Art-Commercial]] · [[Game-Asset-Generation]] · [[Marketing-AI]] - 윤리: [[AI-Copyright]] · [[Training-Data-Lawsuit]] · [[AI-Disclosure]] - Tools: [[ComfyUI]] · [[Automatic1111]] · [[Diffusers-Library]] · [[Replicate-API]] ## 🤖 LLM 활용 힌트 (How to Use This Knowledge) **언제 이 지식을 쓰는가:** - 매 art / design workflow 의 AI integration. - 매 specific platform (Midjourney vs DALL-E vs Flux) 의 선택. - 매 commercial project 의 license consideration. - 매 prompt iteration 의 systematic. - 매 self-host / privacy / cost 결정. **언제 쓰면 안 되는가:** - Specific art critique (artist-level). - 매 country 의 specific copyright (lawyer). - 매 deepfake / harmful generation (ethics). - Photo retouching (Photoshop) 의 better. ## ❌ 안티패턴 (Anti-Patterns) - **Vague prompt** ("nice picture"): generic. - **Long word salad**: contradictory output. - **DALL-E 3 + negative prompt**: 매 thing 의 add. - **Midjourney + Stable Diffusion 의 same syntax**: parameter X. - **No iteration**: 매 1 try 의 acceptance. - **Cloud generation + sensitive content**: privacy. - **Commercial use + license unclear**: legal risk. - **No prompt template / library**: 매 매 generation 의 reinvent. ## 🧪 검증 상태 (Validation) - **정보 상태:** verified (concept-level). - **출처 신뢰도:** B (Stability AI, Midjourney, OpenAI documentation, Hugging Face Diffusers). - **검토 이유:** Manual cleanup. 매 platform 의 매 6 month 의 evolution. ## 🧬 중복 검사 (Duplicate Check) - **기존 유사 문서:** [[AI_Image_Generation_Workflow]] (related), [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] (related), [[Diffusion-Models]] (parent). - **처리 방식:** KEEP (focused on platform / prompt comparison). - **처리 이유:** 매 별 file 의 different angle. ## 🕓 변경 이력 (Changelog) | 날짜 | 변경 내용 | 처리 방식 | 신뢰도 | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A | | 2026-05-09 | Manual cleanup — 4 layer prompt + 매 platform comparison + Diffusers code + LoRA / ControlNet + 안티패턴 추가 | UPDATE | B |