212 lines
6.8 KiB
Markdown
212 lines
6.8 KiB
Markdown
---
|
||
id: wiki-2026-0508-이미지-생성-및-제어-파이프라인
|
||
title: 이미지 생성 및 제어 파이프라인
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Image Generation Pipeline, Controlled Diffusion Pipeline, ControlNet Pipeline]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.92
|
||
verification_status: applied
|
||
tags: [diffusion, image-gen, controlnet, flux, comfyui]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: PyTorch/diffusers/ComfyUI
|
||
---
|
||
|
||
# 이미지 생성 및 제어 파이프라인
|
||
|
||
## 매 한 줄
|
||
> **"매 control 은 conditioning 의 stack"**. 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 pipeline 단계
|
||
- **Prompt encoding**: T5 + CLIP encoder, dual conditioning
|
||
- **Latent init**: noise 또는 img2img latent
|
||
- **Conditioning injection**: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
|
||
- **Sampling**: Euler / DPM-Solver++ / Flow matching, 20-50 steps
|
||
- **Decoding**: VAE → pixel space, optional refiner
|
||
|
||
### 매 control modality
|
||
- **Structure**: canny, depth, pose, segmentation — 매 spatial constraint
|
||
- **Identity**: IP-Adapter Face, InstantID, PuLID — 매 face preservation
|
||
- **Style**: IP-Adapter Style, style-LoRA — 매 reference style
|
||
- **Concept**: textual inversion, custom LoRA — 매 specific subject
|
||
|
||
### 매 응용
|
||
1. Product photography 의 매 batch generation (sku × pose × bg).
|
||
2. Game asset pipeline — 매 concept → portrait → animation pose 일관성.
|
||
3. UI/UX prototyping — 매 wireframe-to-mockup conversion.
|
||
|
||
## 💻 패턴
|
||
|
||
### diffusers FLUX + ControlNet
|
||
```python
|
||
import torch
|
||
from diffusers import FluxControlNetPipeline, FluxControlNetModel
|
||
|
||
controlnet = FluxControlNetModel.from_pretrained(
|
||
"InstantX/FLUX.1-dev-Controlnet-Canny",
|
||
torch_dtype=torch.bfloat16,
|
||
)
|
||
pipe = FluxControlNetPipeline.from_pretrained(
|
||
"black-forest-labs/FLUX.1-dev",
|
||
controlnet=controlnet,
|
||
torch_dtype=torch.bfloat16,
|
||
).to("cuda")
|
||
|
||
image = pipe(
|
||
prompt="cyberpunk samurai, neon rain",
|
||
control_image=canny_image,
|
||
controlnet_conditioning_scale=0.7,
|
||
num_inference_steps=28,
|
||
guidance_scale=3.5,
|
||
).images[0]
|
||
```
|
||
|
||
### Multi-ControlNet stacking
|
||
```python
|
||
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
|
||
|
||
cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
|
||
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")
|
||
|
||
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||
controlnet=[cn_pose, cn_depth],
|
||
torch_dtype=torch.float16,
|
||
).to("cuda")
|
||
|
||
result = pipe(
|
||
prompt="warrior pose, mountain backdrop",
|
||
image=[pose_img, depth_img],
|
||
controlnet_conditioning_scale=[0.8, 0.5],
|
||
num_inference_steps=30,
|
||
).images[0]
|
||
```
|
||
|
||
### IP-Adapter style transfer
|
||
```python
|
||
pipe.load_ip_adapter(
|
||
"h94/IP-Adapter",
|
||
subfolder="sdxl_models",
|
||
weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
|
||
)
|
||
pipe.set_ip_adapter_scale(0.6)
|
||
|
||
out = pipe(
|
||
prompt="portrait of a knight",
|
||
ip_adapter_image=style_reference,
|
||
num_inference_steps=30,
|
||
).images[0]
|
||
```
|
||
|
||
### LoRA composition
|
||
```python
|
||
pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
|
||
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
|
||
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])
|
||
|
||
img = pipe(prompt="my character in anime style, school uniform").images[0]
|
||
```
|
||
|
||
### Img2img refinement
|
||
```python
|
||
from diffusers import AutoPipelineForImage2Image
|
||
|
||
refiner = AutoPipelineForImage2Image.from_pretrained(
|
||
"stabilityai/stable-diffusion-xl-refiner-1.0",
|
||
torch_dtype=torch.float16,
|
||
).to("cuda")
|
||
|
||
refined = refiner(
|
||
prompt=prompt,
|
||
image=base_image,
|
||
strength=0.3,
|
||
num_inference_steps=20,
|
||
).images[0]
|
||
```
|
||
|
||
### ComfyUI API workflow
|
||
```python
|
||
import json, urllib.request
|
||
|
||
workflow = json.load(open("workflows/portrait_pipeline.json"))
|
||
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
|
||
workflow["12"]["inputs"]["seed"] = 12345
|
||
|
||
req = urllib.request.Request(
|
||
"http://127.0.0.1:8188/prompt",
|
||
data=json.dumps({"prompt": workflow}).encode(),
|
||
headers={"Content-Type": "application/json"},
|
||
)
|
||
resp = urllib.request.urlopen(req).read()
|
||
print(resp)
|
||
```
|
||
|
||
### Batch pipeline with caching
|
||
```python
|
||
from functools import lru_cache
|
||
|
||
@lru_cache(maxsize=8)
|
||
def encode_prompt(prompt: str):
|
||
return pipe.encode_prompt(prompt, device="cuda")
|
||
|
||
def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
|
||
results = []
|
||
for p, c, s in zip(prompts, control_imgs, seeds):
|
||
embeds = encode_prompt(p)
|
||
gen = torch.Generator("cuda").manual_seed(s)
|
||
img = pipe(
|
||
prompt_embeds=embeds[0],
|
||
pooled_prompt_embeds=embeds[1],
|
||
control_image=c,
|
||
generator=gen,
|
||
).images[0]
|
||
results.append(img)
|
||
return results
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Pipeline |
|
||
|---|---|
|
||
| Highest fidelity, slow | FLUX.1-dev + ControlNet + refiner |
|
||
| Real-time / interactive | SDXL Turbo / FLUX Schnell, 4-8 steps |
|
||
| Face consistency | InstantID / PuLID + IP-Adapter Face |
|
||
| Style consistency batch | Style-LoRA + fixed seed offset |
|
||
| Local-only (Apple Silicon) | MLX + SDXL or DrawThings, FLUX.1 quantized |
|
||
|
||
**기본값**: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[AI 이미지 생성 (AI Image Generation)]] · [[Diffusion_Models]]
|
||
- 변형: [[초상화 및 애니메이션 스타일 제어]] · [[ComfyUI]]
|
||
- 응용: [[AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow)]] · [[AI 이미지 품질 최적화 및 디버깅 (Image Quality Optimization & Debugging)]]
|
||
- Adjacent: [[ControlNet]] · [[LoRA]] · [[FLUX]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis.
|
||
**언제 X**: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.
|
||
|
||
## ❌ 안티패턴
|
||
- **Conditioning over-stack**: 매 5+ control 동시 — 매 conflict, blurry output.
|
||
- **CFG too high (>7 on FLUX)**: oversaturated, plastic.
|
||
- **LoRA stacking without weight tuning**: 매 incompatible concept blend.
|
||
- **Missing seed control**: 매 batch 마다 random — 재현성 손실.
|
||
- **VAE mismatch**: 매 model VAE 와 다른 VAE 사용 → color shift.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — image gen pipeline + control modalities |
|