Files
2nd/10_Wiki/Topics/AI_and_ML/이미지 생성 및 제어 파이프라인.md
T
2026-05-10 22:08:15 +09:00

212 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-이미지-생성-및-제어-파이프라인
title: 이미지 생성 및 제어 파이프라인
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Image Generation Pipeline, Controlled Diffusion Pipeline, ControlNet Pipeline]
duplicate_of: none
source_trust_level: A
confidence_score: 0.92
verification_status: applied
tags: [diffusion, image-gen, controlnet, flux, comfyui]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: PyTorch/diffusers/ComfyUI
---
# 이미지 생성 및 제어 파이프라인
## 매 한 줄
> **"매 control 은 conditioning 의 stack"**. 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.
## 매 핵심
### 매 pipeline 단계
- **Prompt encoding**: T5 + CLIP encoder, dual conditioning
- **Latent init**: noise 또는 img2img latent
- **Conditioning injection**: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
- **Sampling**: Euler / DPM-Solver++ / Flow matching, 20-50 steps
- **Decoding**: VAE → pixel space, optional refiner
### 매 control modality
- **Structure**: canny, depth, pose, segmentation — 매 spatial constraint
- **Identity**: IP-Adapter Face, InstantID, PuLID — 매 face preservation
- **Style**: IP-Adapter Style, style-LoRA — 매 reference style
- **Concept**: textual inversion, custom LoRA — 매 specific subject
### 매 응용
1. Product photography 의 매 batch generation (sku × pose × bg).
2. Game asset pipeline — 매 concept → portrait → animation pose 일관성.
3. UI/UX prototyping — 매 wireframe-to-mockup conversion.
## 💻 패턴
### diffusers FLUX + ControlNet
```python
import torch
from diffusers import FluxControlNetPipeline, FluxControlNetModel
controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny",
torch_dtype=torch.bfloat16,
)
pipe = FluxControlNetPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
controlnet=controlnet,
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="cyberpunk samurai, neon rain",
control_image=canny_image,
controlnet_conditioning_scale=0.7,
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
```
### Multi-ControlNet stacking
```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=[cn_pose, cn_depth],
torch_dtype=torch.float16,
).to("cuda")
result = pipe(
prompt="warrior pose, mountain backdrop",
image=[pose_img, depth_img],
controlnet_conditioning_scale=[0.8, 0.5],
num_inference_steps=30,
).images[0]
```
### IP-Adapter style transfer
```python
pipe.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.6)
out = pipe(
prompt="portrait of a knight",
ip_adapter_image=style_reference,
num_inference_steps=30,
).images[0]
```
### LoRA composition
```python
pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])
img = pipe(prompt="my character in anime style, school uniform").images[0]
```
### Img2img refinement
```python
from diffusers import AutoPipelineForImage2Image
refiner = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
torch_dtype=torch.float16,
).to("cuda")
refined = refiner(
prompt=prompt,
image=base_image,
strength=0.3,
num_inference_steps=20,
).images[0]
```
### ComfyUI API workflow
```python
import json, urllib.request
workflow = json.load(open("workflows/portrait_pipeline.json"))
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
workflow["12"]["inputs"]["seed"] = 12345
req = urllib.request.Request(
"http://127.0.0.1:8188/prompt",
data=json.dumps({"prompt": workflow}).encode(),
headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req).read()
print(resp)
```
### Batch pipeline with caching
```python
from functools import lru_cache
@lru_cache(maxsize=8)
def encode_prompt(prompt: str):
return pipe.encode_prompt(prompt, device="cuda")
def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
results = []
for p, c, s in zip(prompts, control_imgs, seeds):
embeds = encode_prompt(p)
gen = torch.Generator("cuda").manual_seed(s)
img = pipe(
prompt_embeds=embeds[0],
pooled_prompt_embeds=embeds[1],
control_image=c,
generator=gen,
).images[0]
results.append(img)
return results
```
## 매 결정 기준
| 상황 | Pipeline |
|---|---|
| Highest fidelity, slow | FLUX.1-dev + ControlNet + refiner |
| Real-time / interactive | SDXL Turbo / FLUX Schnell, 4-8 steps |
| Face consistency | InstantID / PuLID + IP-Adapter Face |
| Style consistency batch | Style-LoRA + fixed seed offset |
| Local-only (Apple Silicon) | MLX + SDXL or DrawThings, FLUX.1 quantized |
**기본값**: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.
## 🔗 Graph
- 부모: [[AI 이미지 생성 (AI Image Generation)]] · [[Diffusion_Models]]
- 변형: [[초상화 및 애니메이션 스타일 제어]] · [[ComfyUI]]
- 응용: [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] · [[AI 이미지 품질 최적화 및 디버깅 (Image Quality Optimization & Debugging)]]
- Adjacent: [[ControlNet]] · [[LoRA]] · [[FLUX]]
## 🤖 LLM 활용
**언제**: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis.
**언제 X**: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.
## ❌ 안티패턴
- **Conditioning over-stack**: 매 5+ control 동시 — 매 conflict, blurry output.
- **CFG too high (>7 on FLUX)**: oversaturated, plastic.
- **LoRA stacking without weight tuning**: 매 incompatible concept blend.
- **Missing seed control**: 매 batch 마다 random — 재현성 손실.
- **VAE mismatch**: 매 model VAE 와 다른 VAE 사용 → color shift.
## 🧪 검증 / 중복
- Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — image gen pipeline + control modalities |