6.4 KiB
6.4 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ai-image-generation-patterns | Image Generation — DALL-E / Flux / Stable Diffusion | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Image Generation
Text-to-image. DALL-E 3 (OpenAI), Imagen 4 (Google), Flux (Black Forest Labs), Stable Diffusion (open source). Prompt + negative prompt + seed + ControlNet (변형).
📖 핵심 개념
- Prompt: 자세히, "1girl, blue hair, ..." 같은 tag-style or natural.
- Negative prompt: 배제 (blurry, low quality).
- Seed: 결정성 (같은 seed = 거의 같은 그림).
- ControlNet: 구도 / 자세 / 테두리 제어.
- LoRA: 적은 데이터 fine-tune.
💻 코드 패턴
OpenAI DALL-E 3
const r = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A cat astronaut floating in space, photorealistic, dramatic lighting',
size: '1024x1024', // '1024x1024' | '1792x1024' | '1024x1792'
quality: 'hd', // 'standard' | 'hd'
style: 'vivid', // 'vivid' | 'natural'
n: 1,
});
const url = r.data[0].url;
gpt-image-1 (편집 / 합성)
const r = await openai.images.edit({
model: 'gpt-image-1',
image: fs.createReadStream('cat.png'),
mask: fs.createReadStream('mask.png'), // 변경할 영역
prompt: 'A red bow tie',
});
Replicate (다양한 모델)
import Replicate from 'replicate';
const replicate = new Replicate({ auth: process.env.REPLICATE_TOKEN });
const out = await replicate.run('black-forest-labs/flux-1.1-pro', {
input: {
prompt: 'A cyberpunk city at night',
aspect_ratio: '16:9',
output_format: 'webp',
},
});
// out = [url1] (image url)
Together / Fireworks (Flux schnell, fast)
import Together from 'together-ai';
const t = new Together();
const r = await t.images.create({
model: 'black-forest-labs/FLUX.1-schnell',
prompt: '...',
width: 1024, height: 1024,
});
Self-host Stable Diffusion (Diffusers)
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
'stabilityai/stable-diffusion-xl-base-1.0',
torch_dtype=torch.float16,
).to('cuda')
image = pipe(
prompt='A scenic mountain landscape',
negative_prompt='blurry, low quality',
num_inference_steps=30,
guidance_scale=7.5,
seed=42,
).images[0]
image.save('out.png')
ComfyUI (workflow 기반, advanced)
Visual node editor.
- Text → CLIP encode → KSampler → VAE decode → Image
- ControlNet, LoRA, IPAdapter 추가
- API mode 로 자동화 가능
// ComfyUI API
const ws = new WebSocket('ws://localhost:8188/ws');
ws.send(JSON.stringify({ prompt: workflow }));
Prompt engineering
DALL-E / Imagen: 자연어 풍부.
"A 35mm photo of a vintage espresso machine on a rustic wooden counter,
golden hour light, shallow depth of field, film grain, by Wes Anderson style"
SD / Flux: tag-style 도 OK.
"masterpiece, best quality, 1girl, blue eyes, school uniform, anime style"
Negative: "blurry, low quality, deformed, extra limbs"
Seed (결정성)
// Same seed + prompt = same image
const r = await replicate.run('flux-pro', {
input: { prompt, seed: 42 },
});
→ 작은 변경 시 큰 변경 → seed 다양 시도.
ControlNet (구도 제어)
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
cn = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_canny')
pipe = StableDiffusionControlNetPipeline.from_pretrained(..., controlnet=cn)
# 입력 = canny edge (또는 pose, depth)
input_img = Image.open('reference.png')
canny = canny_detect(input_img)
image = pipe(prompt, image=canny, num_inference_steps=20).images[0]
→ 같은 자세 / 구도 그대로.
LoRA (style fine-tune)
pipe.load_lora_weights('path/to/anime-style-lora.safetensors')
image = pipe('a girl in a garden').images[0]
→ 적은 (10-50개) 이미지로 학습한 style 적용.
Inpainting (영역 변경)
const r = await openai.images.edit({
model: 'gpt-image-1',
image: fs.createReadStream('photo.png'),
mask: fs.createReadStream('mask.png'), // 흰색 = 변경, 검정 = 보존
prompt: 'A red car instead',
});
Outpainting (영역 확장)
// gpt-image-1 / SDXL 가 자연
// 또는 ComfyUI workflow
비용 비교 (대략)
DALL-E 3: $0.04-0.08 / image (HD)
gpt-image-1: $0.04-0.19 / image
Flux Pro: $0.04 / image
Imagen 4: $0.04 / image
Stable Diffusion self-host: $0.001 / image (GPU 시간)
Midjourney: $10-30 / month subscription
Streaming (progressive)
// 일부 model 지원 — SD 등 partial step image
// DALL-E / Flux 는 전체 결과만
Safety / NSFW
// 모든 provider 가 자체 filter.
// Self-host 시 = safety_checker 활성:
pipe.safety_checker = StableDiffusionSafetyChecker.from_pretrained(...)
// 또는 별도 검사 (NSFW classifier)
Storage / CDN
// Provider URL = 1시간 expire (보통)
// → 영구 저장하려면 S3 download
const buf = await fetch(generatedUrl).then(r => r.arrayBuffer());
await s3.upload({ Key: id + '.png', Body: Buffer.from(buf) }).promise();
Watermark (C2PA)
// gpt-image-1 / Imagen 자동 C2PA metadata
// 자체 = 명시적 add
🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 사용자 facing high quality | DALL-E 3 / Flux Pro / Imagen 4 |
| Bulk / cheap | Flux schnell |
| 자체 host / privacy | SDXL / Flux dev |
| 제어 필요 (pose, style) | SD + ControlNet + LoRA |
| Workflow 복잡 | ComfyUI |
| 매우 빠름 | SDXL Turbo (1 step) |
❌ 안티패턴
- Prompt 너무 짧음: 평범 결과. 자세히.
- Negative prompt 누락 (SD): artifact.
- Seed 무시: 재현 불가.
- Storage 안 함: provider URL 만료.
- NSFW filter 비활성 prod: 책임 / 법적.
- C2PA 없음: 사용자 의심 / disinformation.
- Cost monitoring 없음: 큰 청구서.
- Output 검증 없음: 가끔 망가진 이미지.
🤖 LLM 활용 힌트
- 시작 = DALL-E 3 / Flux schnell.
- Quality 강 = Flux Pro.
- 자체 host = SDXL + ComfyUI.
- ControlNet / LoRA = 정밀 제어.