---
id: ai-image-generation-patterns
title: Image Generation — DALL-E / Flux / Stable Diffusion
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [ai, image, generation, vibe-coding]
tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
applied_in: []
aliases: [DALL-E, Flux, Stable Diffusion, Imagen, Midjourney, ControlNet, LoRA]
---

# Image Generation

> Text-to-image. **DALL-E 3 (OpenAI), Imagen 4 (Google), Flux (Black Forest Labs), Stable Diffusion (open source)**. Prompt + negative prompt + seed + ControlNet (변형).

## 📖 핵심 개념
- Prompt: 자세히, "1girl, blue hair, ..." 같은 tag-style or natural.
- Negative prompt: 배제 (blurry, low quality).
- Seed: 결정성 (같은 seed = 거의 같은 그림).
- ControlNet: 구도 / 자세 / 테두리 제어.
- LoRA: 적은 데이터 fine-tune.

## 💻 코드 패턴

### OpenAI DALL-E 3
```ts
const r = await openai.images.generate({
  model: 'dall-e-3',
  prompt: 'A cat astronaut floating in space, photorealistic, dramatic lighting',
  size: '1024x1024',     // '1024x1024' | '1792x1024' | '1024x1792'
  quality: 'hd',         // 'standard' | 'hd'
  style: 'vivid',        // 'vivid' | 'natural'
  n: 1,
});
const url = r.data[0].url;
```

### gpt-image-1 (편집 / 합성)
```ts
const r = await openai.images.edit({
  model: 'gpt-image-1',
  image: fs.createReadStream('cat.png'),
  mask: fs.createReadStream('mask.png'),  // 변경할 영역
  prompt: 'A red bow tie',
});
```

### Replicate (다양한 모델)
```ts
import Replicate from 'replicate';
const replicate = new Replicate({ auth: process.env.REPLICATE_TOKEN });

const out = await replicate.run('black-forest-labs/flux-1.1-pro', {
  input: {
    prompt: 'A cyberpunk city at night',
    aspect_ratio: '16:9',
    output_format: 'webp',
  },
});
// out = [url1] (image url)
```

### Together / Fireworks (Flux schnell, fast)
```ts
import Together from 'together-ai';
const t = new Together();

const r = await t.images.create({
  model: 'black-forest-labs/FLUX.1-schnell',
  prompt: '...',
  width: 1024, height: 1024,
});
```

### Self-host Stable Diffusion (Diffusers)
```python
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
).to('cuda')

image = pipe(
    prompt='A scenic mountain landscape',
    negative_prompt='blurry, low quality',
    num_inference_steps=30,
    guidance_scale=7.5,
    seed=42,
).images[0]
image.save('out.png')
```

### ComfyUI (workflow 기반, advanced)
```
Visual node editor.
- Text → CLIP encode → KSampler → VAE decode → Image
- ControlNet, LoRA, IPAdapter 추가
- API mode 로 자동화 가능
```

```ts
// ComfyUI API
const ws = new WebSocket('ws://localhost:8188/ws');
ws.send(JSON.stringify({ prompt: workflow }));
```

### Prompt engineering
```
DALL-E / Imagen: 자연어 풍부.
"A 35mm photo of a vintage espresso machine on a rustic wooden counter, 
golden hour light, shallow depth of field, film grain, by Wes Anderson style"

SD / Flux: tag-style 도 OK.
"masterpiece, best quality, 1girl, blue eyes, school uniform, anime style"

Negative: "blurry, low quality, deformed, extra limbs"
```

### Seed (결정성)
```ts
// Same seed + prompt = same image
const r = await replicate.run('flux-pro', {
  input: { prompt, seed: 42 },
});
```

→ 작은 변경 시 큰 변경 → seed 다양 시도.

### ControlNet (구도 제어)
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

cn = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_canny')
pipe = StableDiffusionControlNetPipeline.from_pretrained(..., controlnet=cn)

# 입력 = canny edge (또는 pose, depth)
input_img = Image.open('reference.png')
canny = canny_detect(input_img)

image = pipe(prompt, image=canny, num_inference_steps=20).images[0]
```

→ 같은 자세 / 구도 그대로.

### LoRA (style fine-tune)
```python
pipe.load_lora_weights('path/to/anime-style-lora.safetensors')
image = pipe('a girl in a garden').images[0]
```

→ 적은 (10-50개) 이미지로 학습한 style 적용.

### Inpainting (영역 변경)
```ts
const r = await openai.images.edit({
  model: 'gpt-image-1',
  image: fs.createReadStream('photo.png'),
  mask: fs.createReadStream('mask.png'),  // 흰색 = 변경, 검정 = 보존
  prompt: 'A red car instead',
});
```

### Outpainting (영역 확장)
```ts
// gpt-image-1 / SDXL 가 자연
// 또는 ComfyUI workflow
```

### 비용 비교 (대략)
```
DALL-E 3:       $0.04-0.08 / image (HD)
gpt-image-1:    $0.04-0.19 / image
Flux Pro:       $0.04 / image
Imagen 4:       $0.04 / image
Stable Diffusion self-host: $0.001 / image (GPU 시간)
Midjourney:     $10-30 / month subscription
```

### Streaming (progressive)
```ts
// 일부 model 지원 — SD 등 partial step image
// DALL-E / Flux 는 전체 결과만
```

### Safety / NSFW
```ts
// 모든 provider 가 자체 filter.
// Self-host 시 = safety_checker 활성:
pipe.safety_checker = StableDiffusionSafetyChecker.from_pretrained(...)

// 또는 별도 검사 (NSFW classifier)
```

### Storage / CDN
```ts
// Provider URL = 1시간 expire (보통)
// → 영구 저장하려면 S3 download
const buf = await fetch(generatedUrl).then(r => r.arrayBuffer());
await s3.upload({ Key: id + '.png', Body: Buffer.from(buf) }).promise();
```

### Watermark (C2PA)
```ts
// gpt-image-1 / Imagen 자동 C2PA metadata
// 자체 = 명시적 add
```

## 🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 사용자 facing high quality | DALL-E 3 / Flux Pro / Imagen 4 |
| Bulk / cheap | Flux schnell |
| 자체 host / privacy | SDXL / Flux dev |
| 제어 필요 (pose, style) | SD + ControlNet + LoRA |
| Workflow 복잡 | ComfyUI |
| 매우 빠름 | SDXL Turbo (1 step) |

## ❌ 안티패턴
- **Prompt 너무 짧음**: 평범 결과. 자세히.
- **Negative prompt 누락 (SD)**: artifact.
- **Seed 무시**: 재현 불가.
- **Storage 안 함**: provider URL 만료.
- **NSFW filter 비활성 prod**: 책임 / 법적.
- **C2PA 없음**: 사용자 의심 / disinformation.
- **Cost monitoring 없음**: 큰 청구서.
- **Output 검증 없음**: 가끔 망가진 이미지.

## 🤖 LLM 활용 힌트
- 시작 = DALL-E 3 / Flux schnell.
- Quality 강 = Flux Pro.
- 자체 host = SDXL + ComfyUI.
- ControlNet / LoRA = 정밀 제어.

## 🔗 관련 문서
- [[AI_Multimodal_Vision_Patterns]]
- [[AI_LLM_Cost_Optimization]]
- [[AI_Local_LLM_Inference]]