[G1-Sync] Manual knowledge update

2026-05-09 21:08:02 +09:00
parent f0befc887a
commit 93ec7e9056
363 changed files with 68333 additions and 64 deletions
@@ -0,0 +1,243 @@
+---
+id: ai-image-generation-patterns
+title: Image Generation — DALL-E / Flux / Stable Diffusion
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [ai, image, generation, vibe-coding]
+tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
+applied_in: []
+aliases: [DALL-E, Flux, Stable Diffusion, Imagen, Midjourney, ControlNet, LoRA]
+---
+
+# Image Generation
+
+> Text-to-image. **DALL-E 3 (OpenAI), Imagen 4 (Google), Flux (Black Forest Labs), Stable Diffusion (open source)**. Prompt + negative prompt + seed + ControlNet (변형).
+
+## 📖 핵심 개념
+- Prompt: 자세히, "1girl, blue hair, ..." 같은 tag-style or natural.
+- Negative prompt: 배제 (blurry, low quality).
+- Seed: 결정성 (같은 seed = 거의 같은 그림).
+- ControlNet: 구도 / 자세 / 테두리 제어.
+- LoRA: 적은 데이터 fine-tune.
+
+## 💻 코드 패턴
+
+### OpenAI DALL-E 3
+```ts
+const r = await openai.images.generate({
+  model: 'dall-e-3',
+  prompt: 'A cat astronaut floating in space, photorealistic, dramatic lighting',
+  size: '1024x1024',     // '1024x1024' | '1792x1024' | '1024x1792'
+  quality: 'hd',         // 'standard' | 'hd'
+  style: 'vivid',        // 'vivid' | 'natural'
+  n: 1,
+});
+const url = r.data[0].url;
+```
+
+### gpt-image-1 (편집 / 합성)
+```ts
+const r = await openai.images.edit({
+  model: 'gpt-image-1',
+  image: fs.createReadStream('cat.png'),
+  mask: fs.createReadStream('mask.png'),  // 변경할 영역
+  prompt: 'A red bow tie',
+});
+```
+
+### Replicate (다양한 모델)
+```ts
+import Replicate from 'replicate';
+const replicate = new Replicate({ auth: process.env.REPLICATE_TOKEN });
+
+const out = await replicate.run('black-forest-labs/flux-1.1-pro', {
+  input: {
+    prompt: 'A cyberpunk city at night',
+    aspect_ratio: '16:9',
+    output_format: 'webp',
+  },
+});
+// out = [url1] (image url)
+```
+
+### Together / Fireworks (Flux schnell, fast)
+```ts
+import Together from 'together-ai';
+const t = new Together();
+
+const r = await t.images.create({
+  model: 'black-forest-labs/FLUX.1-schnell',
+  prompt: '...',
+  width: 1024, height: 1024,
+});
+```
+
+### Self-host Stable Diffusion (Diffusers)
+```python
+from diffusers import StableDiffusionXLPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    'stabilityai/stable-diffusion-xl-base-1.0',
+    torch_dtype=torch.float16,
+).to('cuda')
+
+image = pipe(
+    prompt='A scenic mountain landscape',
+    negative_prompt='blurry, low quality',
+    num_inference_steps=30,
+    guidance_scale=7.5,
+    seed=42,
+).images[0]
+image.save('out.png')
+```
+
+### ComfyUI (workflow 기반, advanced)
+```
+Visual node editor.
+- Text → CLIP encode → KSampler → VAE decode → Image
+- ControlNet, LoRA, IPAdapter 추가
+- API mode 로 자동화 가능
+```
+
+```ts
+// ComfyUI API
+const ws = new WebSocket('ws://localhost:8188/ws');
+ws.send(JSON.stringify({ prompt: workflow }));
+```
+
+### Prompt engineering
+```
+DALL-E / Imagen: 자연어 풍부.
+"A 35mm photo of a vintage espresso machine on a rustic wooden counter, 
+golden hour light, shallow depth of field, film grain, by Wes Anderson style"
+
+SD / Flux: tag-style 도 OK.
+"masterpiece, best quality, 1girl, blue eyes, school uniform, anime style"
+
+Negative: "blurry, low quality, deformed, extra limbs"
+```
+
+### Seed (결정성)
+```ts
+// Same seed + prompt = same image
+const r = await replicate.run('flux-pro', {
+  input: { prompt, seed: 42 },
+});
+```
+
+→ 작은 변경 시 큰 변경 → seed 다양 시도.
+
+### ControlNet (구도 제어)
+```python
+from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
+
+cn = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_canny')
+pipe = StableDiffusionControlNetPipeline.from_pretrained(..., controlnet=cn)
+
+# 입력 = canny edge (또는 pose, depth)
+input_img = Image.open('reference.png')
+canny = canny_detect(input_img)
+
+image = pipe(prompt, image=canny, num_inference_steps=20).images[0]
+```
+
+→ 같은 자세 / 구도 그대로.
+
+### LoRA (style fine-tune)
+```python
+pipe.load_lora_weights('path/to/anime-style-lora.safetensors')
+image = pipe('a girl in a garden').images[0]
+```
+
+→ 적은 (10-50개) 이미지로 학습한 style 적용.
+
+### Inpainting (영역 변경)
+```ts
+const r = await openai.images.edit({
+  model: 'gpt-image-1',
+  image: fs.createReadStream('photo.png'),
+  mask: fs.createReadStream('mask.png'),  // 흰색 = 변경, 검정 = 보존
+  prompt: 'A red car instead',
+});
+```
+
+### Outpainting (영역 확장)
+```ts
+// gpt-image-1 / SDXL 가 자연
+// 또는 ComfyUI workflow
+```
+
+### 비용 비교 (대략)
+```
+DALL-E 3:       $0.04-0.08 / image (HD)
+gpt-image-1:    $0.04-0.19 / image
+Flux Pro:       $0.04 / image
+Imagen 4:       $0.04 / image
+Stable Diffusion self-host: $0.001 / image (GPU 시간)
+Midjourney:     $10-30 / month subscription
+```
+
+### Streaming (progressive)
+```ts
+// 일부 model 지원 — SD 등 partial step image
+// DALL-E / Flux 는 전체 결과만
+```
+
+### Safety / NSFW
+```ts
+// 모든 provider 가 자체 filter.
+// Self-host 시 = safety_checker 활성:
+pipe.safety_checker = StableDiffusionSafetyChecker.from_pretrained(...)
+
+// 또는 별도 검사 (NSFW classifier)
+```
+
+### Storage / CDN
+```ts
+// Provider URL = 1시간 expire (보통)
+// → 영구 저장하려면 S3 download
+const buf = await fetch(generatedUrl).then(r => r.arrayBuffer());
+await s3.upload({ Key: id + '.png', Body: Buffer.from(buf) }).promise();
+```
+
+### Watermark (C2PA)
+```ts
+// gpt-image-1 / Imagen 자동 C2PA metadata
+// 자체 = 명시적 add
+```
+
+## 🤔 의사결정 기준
+| 상황 | 추천 |
+|---|---|
+| 사용자 facing high quality | DALL-E 3 / Flux Pro / Imagen 4 |
+| Bulk / cheap | Flux schnell |
+| 자체 host / privacy | SDXL / Flux dev |
+| 제어 필요 (pose, style) | SD + ControlNet + LoRA |
+| Workflow 복잡 | ComfyUI |
+| 매우 빠름 | SDXL Turbo (1 step) |
+
+## ❌ 안티패턴
+- **Prompt 너무 짧음**: 평범 결과. 자세히.
+- **Negative prompt 누락 (SD)**: artifact.
+- **Seed 무시**: 재현 불가.
+- **Storage 안 함**: provider URL 만료.
+- **NSFW filter 비활성 prod**: 책임 / 법적.
+- **C2PA 없음**: 사용자 의심 / disinformation.
+- **Cost monitoring 없음**: 큰 청구서.
+- **Output 검증 없음**: 가끔 망가진 이미지.
+
+## 🤖 LLM 활용 힌트
+- 시작 = DALL-E 3 / Flux schnell.
+- Quality 강 = Flux Pro.
+- 자체 host = SDXL + ComfyUI.
+- ControlNet / LoRA = 정밀 제어.
+
+## 🔗 관련 문서
+- [[AI_Multimodal_Vision_Patterns]]
+- [[AI_LLM_Cost_Optimization]]
+- [[AI_Local_LLM_Inference]]