Files
2nd/10_Wiki/Topics/Coding/AI_Image_Generation_Patterns.md
T
2026-05-09 21:08:02 +09:00

6.4 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-image-generation-patterns Image Generation — DALL-E / Flux / Stable Diffusion Coding draft B conceptual 2026-05-09 2026-05-09
ai
image
generation
vibe-coding
language applicable_to
TS / Python
Backend
DALL-E
Flux
Stable Diffusion
Imagen
Midjourney
ControlNet
LoRA

Image Generation

Text-to-image. DALL-E 3 (OpenAI), Imagen 4 (Google), Flux (Black Forest Labs), Stable Diffusion (open source). Prompt + negative prompt + seed + ControlNet (변형).

📖 핵심 개념

  • Prompt: 자세히, "1girl, blue hair, ..." 같은 tag-style or natural.
  • Negative prompt: 배제 (blurry, low quality).
  • Seed: 결정성 (같은 seed = 거의 같은 그림).
  • ControlNet: 구도 / 자세 / 테두리 제어.
  • LoRA: 적은 데이터 fine-tune.

💻 코드 패턴

OpenAI DALL-E 3

const r = await openai.images.generate({
  model: 'dall-e-3',
  prompt: 'A cat astronaut floating in space, photorealistic, dramatic lighting',
  size: '1024x1024',     // '1024x1024' | '1792x1024' | '1024x1792'
  quality: 'hd',         // 'standard' | 'hd'
  style: 'vivid',        // 'vivid' | 'natural'
  n: 1,
});
const url = r.data[0].url;

gpt-image-1 (편집 / 합성)

const r = await openai.images.edit({
  model: 'gpt-image-1',
  image: fs.createReadStream('cat.png'),
  mask: fs.createReadStream('mask.png'),  // 변경할 영역
  prompt: 'A red bow tie',
});

Replicate (다양한 모델)

import Replicate from 'replicate';
const replicate = new Replicate({ auth: process.env.REPLICATE_TOKEN });

const out = await replicate.run('black-forest-labs/flux-1.1-pro', {
  input: {
    prompt: 'A cyberpunk city at night',
    aspect_ratio: '16:9',
    output_format: 'webp',
  },
});
// out = [url1] (image url)

Together / Fireworks (Flux schnell, fast)

import Together from 'together-ai';
const t = new Together();

const r = await t.images.create({
  model: 'black-forest-labs/FLUX.1-schnell',
  prompt: '...',
  width: 1024, height: 1024,
});

Self-host Stable Diffusion (Diffusers)

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
).to('cuda')

image = pipe(
    prompt='A scenic mountain landscape',
    negative_prompt='blurry, low quality',
    num_inference_steps=30,
    guidance_scale=7.5,
    seed=42,
).images[0]
image.save('out.png')

ComfyUI (workflow 기반, advanced)

Visual node editor.
- Text → CLIP encode → KSampler → VAE decode → Image
- ControlNet, LoRA, IPAdapter 추가
- API mode 로 자동화 가능
// ComfyUI API
const ws = new WebSocket('ws://localhost:8188/ws');
ws.send(JSON.stringify({ prompt: workflow }));

Prompt engineering

DALL-E / Imagen: 자연어 풍부.
"A 35mm photo of a vintage espresso machine on a rustic wooden counter, 
golden hour light, shallow depth of field, film grain, by Wes Anderson style"

SD / Flux: tag-style 도 OK.
"masterpiece, best quality, 1girl, blue eyes, school uniform, anime style"

Negative: "blurry, low quality, deformed, extra limbs"

Seed (결정성)

// Same seed + prompt = same image
const r = await replicate.run('flux-pro', {
  input: { prompt, seed: 42 },
});

→ 작은 변경 시 큰 변경 → seed 다양 시도.

ControlNet (구도 제어)

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

cn = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_canny')
pipe = StableDiffusionControlNetPipeline.from_pretrained(..., controlnet=cn)

# 입력 = canny edge (또는 pose, depth)
input_img = Image.open('reference.png')
canny = canny_detect(input_img)

image = pipe(prompt, image=canny, num_inference_steps=20).images[0]

→ 같은 자세 / 구도 그대로.

LoRA (style fine-tune)

pipe.load_lora_weights('path/to/anime-style-lora.safetensors')
image = pipe('a girl in a garden').images[0]

→ 적은 (10-50개) 이미지로 학습한 style 적용.

Inpainting (영역 변경)

const r = await openai.images.edit({
  model: 'gpt-image-1',
  image: fs.createReadStream('photo.png'),
  mask: fs.createReadStream('mask.png'),  // 흰색 = 변경, 검정 = 보존
  prompt: 'A red car instead',
});

Outpainting (영역 확장)

// gpt-image-1 / SDXL 가 자연
// 또는 ComfyUI workflow

비용 비교 (대략)

DALL-E 3:       $0.04-0.08 / image (HD)
gpt-image-1:    $0.04-0.19 / image
Flux Pro:       $0.04 / image
Imagen 4:       $0.04 / image
Stable Diffusion self-host: $0.001 / image (GPU 시간)
Midjourney:     $10-30 / month subscription

Streaming (progressive)

// 일부 model 지원 — SD 등 partial step image
// DALL-E / Flux 는 전체 결과만

Safety / NSFW

// 모든 provider 가 자체 filter.
// Self-host 시 = safety_checker 활성:
pipe.safety_checker = StableDiffusionSafetyChecker.from_pretrained(...)

// 또는 별도 검사 (NSFW classifier)

Storage / CDN

// Provider URL = 1시간 expire (보통)
// → 영구 저장하려면 S3 download
const buf = await fetch(generatedUrl).then(r => r.arrayBuffer());
await s3.upload({ Key: id + '.png', Body: Buffer.from(buf) }).promise();

Watermark (C2PA)

// gpt-image-1 / Imagen 자동 C2PA metadata
// 자체 = 명시적 add

🤔 의사결정 기준

상황 추천
사용자 facing high quality DALL-E 3 / Flux Pro / Imagen 4
Bulk / cheap Flux schnell
자체 host / privacy SDXL / Flux dev
제어 필요 (pose, style) SD + ControlNet + LoRA
Workflow 복잡 ComfyUI
매우 빠름 SDXL Turbo (1 step)

안티패턴

  • Prompt 너무 짧음: 평범 결과. 자세히.
  • Negative prompt 누락 (SD): artifact.
  • Seed 무시: 재현 불가.
  • Storage 안 함: provider URL 만료.
  • NSFW filter 비활성 prod: 책임 / 법적.
  • C2PA 없음: 사용자 의심 / disinformation.
  • Cost monitoring 없음: 큰 청구서.
  • Output 검증 없음: 가끔 망가진 이미지.

🤖 LLM 활용 힌트

  • 시작 = DALL-E 3 / Flux schnell.
  • Quality 강 = Flux Pro.
  • 자체 host = SDXL + ComfyUI.
  • ControlNet / LoRA = 정밀 제어.

🔗 관련 문서