Files
2nd/10_Wiki/Topics/AI_and_ML/이미지 생성 및 제어 파이프라인.md
T
2026-05-10 22:08:15 +09:00

6.8 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-이미지-생성-및-제어-파이프라인 이미지 생성 및 제어 파이프라인 10_Wiki/Topics verified self
Image Generation Pipeline
Controlled Diffusion Pipeline
ControlNet Pipeline
none A 0.92 applied
diffusion
image-gen
controlnet
flux
comfyui
2026-05-10 pending
language framework
python PyTorch/diffusers/ComfyUI

이미지 생성 및 제어 파이프라인

매 한 줄

"매 control 은 conditioning 의 stack". 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.

매 핵심

매 pipeline 단계

  • Prompt encoding: T5 + CLIP encoder, dual conditioning
  • Latent init: noise 또는 img2img latent
  • Conditioning injection: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
  • Sampling: Euler / DPM-Solver++ / Flow matching, 20-50 steps
  • Decoding: VAE → pixel space, optional refiner

매 control modality

  • Structure: canny, depth, pose, segmentation — 매 spatial constraint
  • Identity: IP-Adapter Face, InstantID, PuLID — 매 face preservation
  • Style: IP-Adapter Style, style-LoRA — 매 reference style
  • Concept: textual inversion, custom LoRA — 매 specific subject

매 응용

  1. Product photography 의 매 batch generation (sku × pose × bg).
  2. Game asset pipeline — 매 concept → portrait → animation pose 일관성.
  3. UI/UX prototyping — 매 wireframe-to-mockup conversion.

💻 패턴

diffusers FLUX + ControlNet

import torch
from diffusers import FluxControlNetPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained(
    "InstantX/FLUX.1-dev-Controlnet-Canny",
    torch_dtype=torch.bfloat16,
)
pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="cyberpunk samurai, neon rain",
    control_image=canny_image,
    controlnet_conditioning_scale=0.7,
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]

Multi-ControlNet stacking

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[cn_pose, cn_depth],
    torch_dtype=torch.float16,
).to("cuda")

result = pipe(
    prompt="warrior pose, mountain backdrop",
    image=[pose_img, depth_img],
    controlnet_conditioning_scale=[0.8, 0.5],
    num_inference_steps=30,
).images[0]

IP-Adapter style transfer

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.6)

out = pipe(
    prompt="portrait of a knight",
    ip_adapter_image=style_reference,
    num_inference_steps=30,
).images[0]

LoRA composition

pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])

img = pipe(prompt="my character in anime style, school uniform").images[0]

Img2img refinement

from diffusers import AutoPipelineForImage2Image

refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
).to("cuda")

refined = refiner(
    prompt=prompt,
    image=base_image,
    strength=0.3,
    num_inference_steps=20,
).images[0]

ComfyUI API workflow

import json, urllib.request

workflow = json.load(open("workflows/portrait_pipeline.json"))
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
workflow["12"]["inputs"]["seed"] = 12345

req = urllib.request.Request(
    "http://127.0.0.1:8188/prompt",
    data=json.dumps({"prompt": workflow}).encode(),
    headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req).read()
print(resp)

Batch pipeline with caching

from functools import lru_cache

@lru_cache(maxsize=8)
def encode_prompt(prompt: str):
    return pipe.encode_prompt(prompt, device="cuda")

def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
    results = []
    for p, c, s in zip(prompts, control_imgs, seeds):
        embeds = encode_prompt(p)
        gen = torch.Generator("cuda").manual_seed(s)
        img = pipe(
            prompt_embeds=embeds[0],
            pooled_prompt_embeds=embeds[1],
            control_image=c,
            generator=gen,
        ).images[0]
        results.append(img)
    return results

매 결정 기준

상황 Pipeline
Highest fidelity, slow FLUX.1-dev + ControlNet + refiner
Real-time / interactive SDXL Turbo / FLUX Schnell, 4-8 steps
Face consistency InstantID / PuLID + IP-Adapter Face
Style consistency batch Style-LoRA + fixed seed offset
Local-only (Apple Silicon) MLX + SDXL or DrawThings, FLUX.1 quantized

기본값: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.

🔗 Graph

🤖 LLM 활용

언제: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis. 언제 X: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.

안티패턴

  • Conditioning over-stack: 매 5+ control 동시 — 매 conflict, blurry output.
  • CFG too high (>7 on FLUX): oversaturated, plastic.
  • LoRA stacking without weight tuning: 매 incompatible concept blend.
  • Missing seed control: 매 batch 마다 random — 재현성 손실.
  • VAE mismatch: 매 model VAE 와 다른 VAE 사용 → color shift.

🧪 검증 / 중복

  • Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — image gen pipeline + control modalities