Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

6.8 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

이미지 생성 및 제어 파이프라인

매 한 줄

"매 control 은 conditioning 의 stack". 2026 image gen pipeline 은 base model (FLUX.1 / SDXL / SD3.5) → control adapter (ControlNet / IP-Adapter / T2I-Adapter) → LoRA → refiner 의 layered conditioning. ComfyUI 는 매 node graph 로 이를 explicit, diffusers 는 매 pipeline class 로 abstraction.

매 핵심

매 pipeline 단계

Prompt encoding: T5 + CLIP encoder, dual conditioning
Latent init: noise 또는 img2img latent
Conditioning injection: ControlNet (structure), IP-Adapter (style ref), LoRA (concept)
Sampling: Euler / DPM-Solver++ / Flow matching, 20-50 steps
Decoding: VAE → pixel space, optional refiner

매 control modality

Structure: canny, depth, pose, segmentation — 매 spatial constraint
Identity: IP-Adapter Face, InstantID, PuLID — 매 face preservation
Style: IP-Adapter Style, style-LoRA — 매 reference style
Concept: textual inversion, custom LoRA — 매 specific subject

매 응용

Product photography 의 매 batch generation (sku × pose × bg).
Game asset pipeline — 매 concept → portrait → animation pose 일관성.
UI/UX prototyping — 매 wireframe-to-mockup conversion.

💻 패턴

diffusers FLUX + ControlNet

import torch
from diffusers import FluxControlNetPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained(
    "InstantX/FLUX.1-dev-Controlnet-Canny",
    torch_dtype=torch.bfloat16,
)
pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="cyberpunk samurai, neon rain",
    control_image=canny_image,
    controlnet_conditioning_scale=0.7,
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]

Multi-ControlNet stacking

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

cn_pose = ControlNetModel.from_pretrained("xinsir/controlnet-openpose-sdxl-1.0")
cn_depth = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[cn_pose, cn_depth],
    torch_dtype=torch.float16,
).to("cuda")

result = pipe(
    prompt="warrior pose, mountain backdrop",
    image=[pose_img, depth_img],
    controlnet_conditioning_scale=[0.8, 0.5],
    num_inference_steps=30,
).images[0]

IP-Adapter style transfer

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.6)

out = pipe(
    prompt="portrait of a knight",
    ip_adapter_image=style_reference,
    num_inference_steps=30,
).images[0]

LoRA composition

pipe.load_lora_weights("lora_pack/", weight_name="anime_style.safetensors", adapter_name="anime")
pipe.load_lora_weights("lora_pack/", weight_name="my_character.safetensors", adapter_name="char")
pipe.set_adapters(["anime", "char"], adapter_weights=[0.7, 0.9])

img = pipe(prompt="my character in anime style, school uniform").images[0]

Img2img refinement

from diffusers import AutoPipelineForImage2Image

refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
).to("cuda")

refined = refiner(
    prompt=prompt,
    image=base_image,
    strength=0.3,
    num_inference_steps=20,
).images[0]

ComfyUI API workflow

import json, urllib.request

workflow = json.load(open("workflows/portrait_pipeline.json"))
workflow["6"]["inputs"]["text"] = "cyberpunk samurai"
workflow["12"]["inputs"]["seed"] = 12345

req = urllib.request.Request(
    "http://127.0.0.1:8188/prompt",
    data=json.dumps({"prompt": workflow}).encode(),
    headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req).read()
print(resp)

Batch pipeline with caching

from functools import lru_cache

@lru_cache(maxsize=8)
def encode_prompt(prompt: str):
    return pipe.encode_prompt(prompt, device="cuda")

def generate_batch(prompts: list[str], control_imgs: list, seeds: list[int]):
    results = []
    for p, c, s in zip(prompts, control_imgs, seeds):
        embeds = encode_prompt(p)
        gen = torch.Generator("cuda").manual_seed(s)
        img = pipe(
            prompt_embeds=embeds[0],
            pooled_prompt_embeds=embeds[1],
            control_image=c,
            generator=gen,
        ).images[0]
        results.append(img)
    return results

매 결정 기준

상황	Pipeline
Highest fidelity, slow	FLUX.1-dev + ControlNet + refiner
Real-time / interactive	SDXL Turbo / FLUX Schnell, 4-8 steps
Face consistency	InstantID / PuLID + IP-Adapter Face
Style consistency batch	Style-LoRA + fixed seed offset
Local-only (Apple Silicon)	MLX + SDXL or DrawThings, FLUX.1 quantized

기본값: FLUX.1-dev + 1 ControlNet (canny/depth) + IP-Adapter, 28 steps, guidance 3.5.

🔗 Graph

부모: AI 이미지 생성 (AI Image Generation) · Diffusion_Models
변형: 초상화 및 애니메이션 스타일 제어 · ComfyUI
응용: AI 이미지 생성 및 편집 워크플로우 (AI Image Generation & Editing Workflow) · AI 이미지 품질 최적화 및 디버깅 (Image Quality Optimization & Debugging)
Adjacent: ControlNet · LoRA · FLUX

🤖 LLM 활용

언제: prompt rewriting, control image 의 caption 추출, workflow JSON 생성, error diagnosis. 언제 X: VAE/UNet 의 inner forward — 매 결정론적, LLM 의 X.

❌ 안티패턴

Conditioning over-stack: 매 5+ control 동시 — 매 conflict, blurry output.
CFG too high (>7 on FLUX): oversaturated, plastic.
LoRA stacking without weight tuning: 매 incompatible concept blend.
Missing seed control: 매 batch 마다 random — 재현성 손실.
VAE mismatch: 매 model VAE 와 다른 VAE 사용 → color shift.

🧪 검증 / 중복

Verified (diffusers 0.30+, ComfyUI 2026-04, FLUX.1 model card).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — image gen pipeline + control modalities

6.8 KiB Raw Blame History Unescape Escape