--- id: ai-image-generation-patterns title: Image Generation β€” DALL-E / Flux / Stable Diffusion category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [ai, image, generation, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend"] } applied_in: [] aliases: [DALL-E, Flux, Stable Diffusion, Imagen, Midjourney, ControlNet, LoRA] --- # Image Generation > Text-to-image. **DALL-E 3 (OpenAI), Imagen 4 (Google), Flux (Black Forest Labs), Stable Diffusion (open source)**. Prompt + negative prompt + seed + ControlNet (λ³€ν˜•). ## πŸ“– 핡심 κ°œλ… - Prompt: μžμ„Ένžˆ, "1girl, blue hair, ..." 같은 tag-style or natural. - Negative prompt: 배제 (blurry, low quality). - Seed: κ²°μ •μ„± (같은 seed = 거의 같은 κ·Έλ¦Ό). - ControlNet: ꡬ도 / μžμ„Έ / ν…Œλ‘λ¦¬ μ œμ–΄. - LoRA: 적은 데이터 fine-tune. ## πŸ’» μ½”λ“œ νŒ¨ν„΄ ### OpenAI DALL-E 3 ```ts const r = await openai.images.generate({ model: 'dall-e-3', prompt: 'A cat astronaut floating in space, photorealistic, dramatic lighting', size: '1024x1024', // '1024x1024' | '1792x1024' | '1024x1792' quality: 'hd', // 'standard' | 'hd' style: 'vivid', // 'vivid' | 'natural' n: 1, }); const url = r.data[0].url; ``` ### gpt-image-1 (νŽΈμ§‘ / ν•©μ„±) ```ts const r = await openai.images.edit({ model: 'gpt-image-1', image: fs.createReadStream('cat.png'), mask: fs.createReadStream('mask.png'), // λ³€κ²½ν•  μ˜μ—­ prompt: 'A red bow tie', }); ``` ### Replicate (λ‹€μ–‘ν•œ λͺ¨λΈ) ```ts import Replicate from 'replicate'; const replicate = new Replicate({ auth: process.env.REPLICATE_TOKEN }); const out = await replicate.run('black-forest-labs/flux-1.1-pro', { input: { prompt: 'A cyberpunk city at night', aspect_ratio: '16:9', output_format: 'webp', }, }); // out = [url1] (image url) ``` ### Together / Fireworks (Flux schnell, fast) ```ts import Together from 'together-ai'; const t = new Together(); const r = await t.images.create({ model: 'black-forest-labs/FLUX.1-schnell', prompt: '...', width: 1024, height: 1024, }); ``` ### Self-host Stable Diffusion (Diffusers) ```python from diffusers import StableDiffusionXLPipeline import torch pipe = StableDiffusionXLPipeline.from_pretrained( 'stabilityai/stable-diffusion-xl-base-1.0', torch_dtype=torch.float16, ).to('cuda') image = pipe( prompt='A scenic mountain landscape', negative_prompt='blurry, low quality', num_inference_steps=30, guidance_scale=7.5, seed=42, ).images[0] image.save('out.png') ``` ### ComfyUI (workflow 기반, advanced) ``` Visual node editor. - Text β†’ CLIP encode β†’ KSampler β†’ VAE decode β†’ Image - ControlNet, LoRA, IPAdapter μΆ”κ°€ - API mode 둜 μžλ™ν™” κ°€λŠ₯ ``` ```ts // ComfyUI API const ws = new WebSocket('ws://localhost:8188/ws'); ws.send(JSON.stringify({ prompt: workflow })); ``` ### Prompt engineering ``` DALL-E / Imagen: μžμ—°μ–΄ 풍뢀. "A 35mm photo of a vintage espresso machine on a rustic wooden counter, golden hour light, shallow depth of field, film grain, by Wes Anderson style" SD / Flux: tag-style 도 OK. "masterpiece, best quality, 1girl, blue eyes, school uniform, anime style" Negative: "blurry, low quality, deformed, extra limbs" ``` ### Seed (κ²°μ •μ„±) ```ts // Same seed + prompt = same image const r = await replicate.run('flux-pro', { input: { prompt, seed: 42 }, }); ``` β†’ μž‘μ€ λ³€κ²½ μ‹œ 큰 λ³€κ²½ β†’ seed λ‹€μ–‘ μ‹œλ„. ### ControlNet (ꡬ도 μ œμ–΄) ```python from diffusers import StableDiffusionControlNetPipeline, ControlNetModel cn = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_canny') pipe = StableDiffusionControlNetPipeline.from_pretrained(..., controlnet=cn) # μž…λ ₯ = canny edge (λ˜λŠ” pose, depth) input_img = Image.open('reference.png') canny = canny_detect(input_img) image = pipe(prompt, image=canny, num_inference_steps=20).images[0] ``` β†’ 같은 μžμ„Έ / ꡬ도 κ·ΈλŒ€λ‘œ. ### LoRA (style fine-tune) ```python pipe.load_lora_weights('path/to/anime-style-lora.safetensors') image = pipe('a girl in a garden').images[0] ``` β†’ 적은 (10-50개) μ΄λ―Έμ§€λ‘œ ν•™μŠ΅ν•œ style 적용. ### Inpainting (μ˜μ—­ λ³€κ²½) ```ts const r = await openai.images.edit({ model: 'gpt-image-1', image: fs.createReadStream('photo.png'), mask: fs.createReadStream('mask.png'), // 흰색 = λ³€κ²½, κ²€μ • = 보쑴 prompt: 'A red car instead', }); ``` ### Outpainting (μ˜μ—­ ν™•μž₯) ```ts // gpt-image-1 / SDXL κ°€ μžμ—° // λ˜λŠ” ComfyUI workflow ``` ### λΉ„μš© 비ꡐ (λŒ€λž΅) ``` DALL-E 3: $0.04-0.08 / image (HD) gpt-image-1: $0.04-0.19 / image Flux Pro: $0.04 / image Imagen 4: $0.04 / image Stable Diffusion self-host: $0.001 / image (GPU μ‹œκ°„) Midjourney: $10-30 / month subscription ``` ### Streaming (progressive) ```ts // 일뢀 model 지원 β€” SD λ“± partial step image // DALL-E / Flux λŠ” 전체 결과만 ``` ### Safety / NSFW ```ts // λͺ¨λ“  provider κ°€ 자체 filter. // Self-host μ‹œ = safety_checker ν™œμ„±: pipe.safety_checker = StableDiffusionSafetyChecker.from_pretrained(...) // λ˜λŠ” 별도 검사 (NSFW classifier) ``` ### Storage / CDN ```ts // Provider URL = 1μ‹œκ°„ expire (보톡) // β†’ 영ꡬ μ €μž₯ν•˜λ €λ©΄ S3 download const buf = await fetch(generatedUrl).then(r => r.arrayBuffer()); await s3.upload({ Key: id + '.png', Body: Buffer.from(buf) }).promise(); ``` ### Watermark (C2PA) ```ts // gpt-image-1 / Imagen μžλ™ C2PA metadata // 자체 = λͺ…μ‹œμ  add ``` ## πŸ€” μ˜μ‚¬κ²°μ • κΈ°μ€€ | 상황 | μΆ”μ²œ | |---|---| | μ‚¬μš©μž facing high quality | DALL-E 3 / Flux Pro / Imagen 4 | | Bulk / cheap | Flux schnell | | 자체 host / privacy | SDXL / Flux dev | | μ œμ–΄ ν•„μš” (pose, style) | SD + ControlNet + LoRA | | Workflow 볡작 | ComfyUI | | 맀우 빠름 | SDXL Turbo (1 step) | ## ❌ μ•ˆν‹°νŒ¨ν„΄ - **Prompt λ„ˆλ¬΄ 짧음**: 평범 κ²°κ³Ό. μžμ„Ένžˆ. - **Negative prompt λˆ„λ½ (SD)**: artifact. - **Seed λ¬΄μ‹œ**: μž¬ν˜„ λΆˆκ°€. - **Storage μ•ˆ 함**: provider URL 만료. - **NSFW filter λΉ„ν™œμ„± prod**: μ±…μž„ / 법적. - **C2PA μ—†μŒ**: μ‚¬μš©μž μ˜μ‹¬ / disinformation. - **Cost monitoring μ—†μŒ**: 큰 μ²­κ΅¬μ„œ. - **Output 검증 μ—†μŒ**: 가끔 망가진 이미지. ## πŸ€– LLM ν™œμš© 힌트 - μ‹œμž‘ = DALL-E 3 / Flux schnell. - Quality κ°• = Flux Pro. - 자체 host = SDXL + ComfyUI. - ControlNet / LoRA = μ •λ°€ μ œμ–΄. ## πŸ”— κ΄€λ ¨ λ¬Έμ„œ - [[AI_Multimodal_Vision_Patterns]] - [[AI_LLM_Cost_Optimization]] - [[AI_Local_LLM_Inference]]