6.8 KiB
6.8 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-이미지-생성-및-제어-파이프라인 | 이미지 생성 및 제어 파이프라인 | 10_Wiki/Topics | verified | self |
|
none | A | 0.92 | applied |
|
2026-05-10 | pending |
|
이미지 생성 및 제어 파이프라인
매 한 줄
"매 control 은 conditioning 의 stack". 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.
매 핵심
매 pipeline 단계
- Prompt encoding: T5 + CLIP encoder, dual conditioning
- Latent init: noise 또는 img2img latent
- Conditioning injection: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
- Sampling: Euler / DPM-Solver++ / Flow matching, 20-50 steps
- Decoding: VAE → pixel space, optional refiner
매 control modality
- Structure: canny, depth, pose, segmentation — 매 spatial constraint
- Identity: IP-Adapter Face, InstantID, PuLID — 매 face preservation
- Style: IP-Adapter Style, style-LoRA — 매 reference style
- Concept: textual inversion, custom LoRA — 매 specific subject
매 응용
- Product photography 의 매 batch generation (sku × pose × bg).
- Game asset pipeline — 매 concept → portrait → animation pose 일관성.
- UI/UX prototyping — 매 wireframe-to-mockup conversion.
💻 패턴
diffusers FLUX + ControlNet
import torch
from diffusers import FluxControlNetPipeline, FluxControlNetModel
controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny",
torch_dtype=torch.bfloat16,
)
pipe = FluxControlNetPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
controlnet=controlnet,
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="cyberpunk samurai, neon rain",
control_image=canny_image,
controlnet_conditioning_scale=0.7,
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
Multi-ControlNet stacking
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=[cn_pose, cn_depth],
torch_dtype=torch.float16,
).to("cuda")
result = pipe(
prompt="warrior pose, mountain backdrop",
image=[pose_img, depth_img],
controlnet_conditioning_scale=[0.8, 0.5],
num_inference_steps=30,
).images[0]
IP-Adapter style transfer
pipe.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.6)
out = pipe(
prompt="portrait of a knight",
ip_adapter_image=style_reference,
num_inference_steps=30,
).images[0]
LoRA composition
pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])
img = pipe(prompt="my character in anime style, school uniform").images[0]
Img2img refinement
from diffusers import AutoPipelineForImage2Image
refiner = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
torch_dtype=torch.float16,
).to("cuda")
refined = refiner(
prompt=prompt,
image=base_image,
strength=0.3,
num_inference_steps=20,
).images[0]
ComfyUI API workflow
import json, urllib.request
workflow = json.load(open("workflows/portrait_pipeline.json"))
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
workflow["12"]["inputs"]["seed"] = 12345
req = urllib.request.Request(
"http://127.0.0.1:8188/prompt",
data=json.dumps({"prompt": workflow}).encode(),
headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req).read()
print(resp)
Batch pipeline with caching
from functools import lru_cache
@lru_cache(maxsize=8)
def encode_prompt(prompt: str):
return pipe.encode_prompt(prompt, device="cuda")
def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
results = []
for p, c, s in zip(prompts, control_imgs, seeds):
embeds = encode_prompt(p)
gen = torch.Generator("cuda").manual_seed(s)
img = pipe(
prompt_embeds=embeds[0],
pooled_prompt_embeds=embeds[1],
control_image=c,
generator=gen,
).images[0]
results.append(img)
return results
매 결정 기준
| 상황 | Pipeline |
|---|---|
| Highest fidelity, slow | FLUX.1-dev + ControlNet + refiner |
| Real-time / interactive | SDXL Turbo / FLUX Schnell, 4-8 steps |
| Face consistency | InstantID / PuLID + IP-Adapter Face |
| Style consistency batch | Style-LoRA + fixed seed offset |
| Local-only (Apple Silicon) | MLX + SDXL or DrawThings, FLUX.1 quantized |
기본값: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.
🔗 Graph
- 부모: AI 이미지 생성 (AI Image Generation) · Diffusion_Models
- 변형: 초상화 및 애니메이션 스타일 제어 · ComfyUI
- 응용: AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow) · AI 이미지 품질 최적화 및 디버깅 (Image Quality Optimization & Debugging)
- Adjacent: ControlNet · LoRA · FLUX
🤖 LLM 활용
언제: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis. 언제 X: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.
❌ 안티패턴
- Conditioning over-stack: 매 5+ control 동시 — 매 conflict, blurry output.
- CFG too high (>7 on FLUX): oversaturated, plastic.
- LoRA stacking without weight tuning: 매 incompatible concept blend.
- Missing seed control: 매 batch 마다 random — 재현성 손실.
- VAE mismatch: 매 model VAE 와 다른 VAE 사용 → color shift.
🧪 검증 / 중복
- Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — image gen pipeline + control modalities |