2nd/10_Wiki/Topics/AI_and_ML/이미지 생성 및 제어 파이프라인.md

---
id: wiki-2026-0508-이미지-생성-및-제어-파이프라인
title: 이미지 생성 및 제어 파이프라인
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Image Generation Pipeline, Controlled Diffusion Pipeline, ControlNet Pipeline]
duplicate_of: none
source_trust_level: A
confidence_score: 0.92
verification_status: applied
tags: [diffusion, image-gen, controlnet, flux, comfyui]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: PyTorch/diffusers/ComfyUI
---

# 이미지 생성 및 제어 파이프라인

## 매 한 줄
> **"매 control 은 conditioning 의 stack"**. 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.

## 매 핵심

### 매 pipeline 단계
- **Prompt encoding**: T5 + CLIP encoder, dual conditioning
- **Latent init**: noise 또는 img2img latent
- **Conditioning injection**: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
- **Sampling**: Euler / DPM-Solver++ / Flow matching, 20-50 steps
- **Decoding**: VAE → pixel space, optional refiner

### 매 control modality
- **Structure**: canny, depth, pose, segmentation — 매 spatial constraint
- **Identity**: IP-Adapter Face, InstantID, PuLID — 매 face preservation
- **Style**: IP-Adapter Style, style-LoRA — 매 reference style
- **Concept**: textual inversion, custom LoRA — 매 specific subject

### 매 응용
1. Product photography 의 매 batch generation (sku × pose × bg).
2. Game asset pipeline — 매 concept → portrait → animation pose 일관성.
3. UI/UX prototyping — 매 wireframe-to-mockup conversion.

## 💻 패턴

### diffusers FLUX + ControlNet
```python
import torch
from diffusers import FluxControlNetPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained(
    "InstantX/FLUX.1-dev-Controlnet-Canny",
    torch_dtype=torch.bfloat16,
)
pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="cyberpunk samurai, neon rain",
    control_image=canny_image,
    controlnet_conditioning_scale=0.7,
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]
```

### Multi-ControlNet stacking
```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[cn_pose, cn_depth],
    torch_dtype=torch.float16,
).to("cuda")

result = pipe(
    prompt="warrior pose, mountain backdrop",
    image=[pose_img, depth_img],
    controlnet_conditioning_scale=[0.8, 0.5],
    num_inference_steps=30,
).images[0]
```

### IP-Adapter style transfer
```python
pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.6)

out = pipe(
    prompt="portrait of a knight",
    ip_adapter_image=style_reference,
    num_inference_steps=30,
).images[0]
```

### LoRA composition
```python
pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])

img = pipe(prompt="my character in anime style, school uniform").images[0]
```

### Img2img refinement
```python
from diffusers import AutoPipelineForImage2Image

refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
).to("cuda")

refined = refiner(
    prompt=prompt,
    image=base_image,
    strength=0.3,
    num_inference_steps=20,
).images[0]
```

### ComfyUI API workflow
```python
import json, urllib.request

workflow = json.load(open("workflows/portrait_pipeline.json"))
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
workflow["12"]["inputs"]["seed"] = 12345

req = urllib.request.Request(
    "http://127.0.0.1:8188/prompt",
    data=json.dumps({"prompt": workflow}).encode(),
    headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req).read()
print(resp)
```

### Batch pipeline with caching
```python
from functools import lru_cache

@lru_cache(maxsize=8)
def encode_prompt(prompt: str):
    return pipe.encode_prompt(prompt, device="cuda")

def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
    results = []
    for p, c, s in zip(prompts, control_imgs, seeds):
        embeds = encode_prompt(p)
        gen = torch.Generator("cuda").manual_seed(s)
        img = pipe(
            prompt_embeds=embeds[0],
            pooled_prompt_embeds=embeds[1],
            control_image=c,
            generator=gen,
        ).images[0]
        results.append(img)
    return results
```

## 매 결정 기준
| 상황 | Pipeline |
|---|---|
| Highest fidelity, slow | FLUX.1-dev + ControlNet + refiner |
| Real-time / interactive | SDXL Turbo / FLUX Schnell, 4-8 steps |
| Face consistency | InstantID / PuLID + IP-Adapter Face |
| Style consistency batch | Style-LoRA + fixed seed offset |
| Local-only (Apple Silicon) | MLX + SDXL or DrawThings, FLUX.1 quantized |

**기본값**: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.

## 🔗 Graph
- 부모: [[AI 이미지 생성 (AI Image Generation)]] · [[Diffusion_Models]]
- 변형: [[초상화 및 애니메이션 스타일 제어]] · [[ComfyUI]]
- 응용: [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] · [[AI 이미지 품질 최적화 및 디버깅 (Image Quality Optimization & Debugging)]]
- Adjacent: [[ControlNet]] · [[LoRA]] · [[FLUX]]

## 🤖 LLM 활용
**언제**: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis.
**언제 X**: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.

## ❌ 안티패턴
- **Conditioning over-stack**: 매 5+ control 동시 — 매 conflict, blurry output.
- **CFG too high (>7 on FLUX)**: oversaturated, plastic.
- **LoRA stacking without weight tuning**: 매 incompatible concept blend.
- **Missing seed control**: 매 batch 마다 random — 재현성 손실.
- **VAE mismatch**: 매 model VAE 와 다른 VAE 사용 → color shift.

## 🧪 검증 / 중복
- Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — image gen pipeline + control modalities |