Files
2nd/10_Wiki/Topics/AI_and_ML/스테이블 디퓨전의 가중치 및 제어 시스템.md
T
2026-05-10 22:08:15 +09:00

209 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-스테이블-디퓨전의-가중치-및-제어-시스템
title: 스테이블 디퓨전의 가중치 및 제어 시스템
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [SD Weight Control, Stable Diffusion Weights, Prompt Weighting, ControlNet]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [stable-diffusion, weights, controlnet, image-generation, prompt-engineering]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: Diffusers/ComfyUI
---
# 스테이블 디퓨전의 가중치 및 제어 시스템
## 매 한 줄
> **"매 prompt weight + LoRA + ControlNet + IP-Adapter 의 의 4-layer 의 의 의 의 의 의 fine control 의 의."**. Stable Diffusion 의 raw prompt-only 의 의 (2022) 의 의 modern (SD3.5, FLUX.1, SDXL Lightning) 의 의 multi-modal conditioning stack 의 의. 매 2026 의 의 ComfyUI graph + FLUX dev + LoRA stacking + IP-Adapter (face/style) + ControlNet (pose/depth/canny) 의 production pipeline.
## 매 핵심
### 매 control 의 layer
- **Prompt weight**: `(token:1.3)` — attention multiplier on text token.
- **CFG scale**: 1-15 — text conditioning strength (FLUX 의 의 distilled CFG).
- **LoRA**: rank-decomposed weight delta — `<lora:name:0.8>` 의 의 strength.
- **ControlNet**: structural conditioning (pose, depth, canny, scribble).
- **IP-Adapter**: image prompt — face / style 의 의 image 의 의 의.
- **Regional prompting**: spatial mask 의 의 prompt 의 의 의.
### 매 LoRA 의 의
- **Rank (r)**: 4-128 — capacity (의 의 의 detail, 의 의 overfit).
- **Alpha (α)**: scaling — effective_weight = alpha/rank × ΔW.
- **Stacking**: multiple LoRA — weight 의 sum 의 의 의 의 saturation.
- **DoRA / LoHa**: LoRA variants — better quality at low rank.
### 매 응용
1. Character consistency — IP-Adapter face + LoRA.
2. Style transfer — style LoRA + style reference IP-Adapter.
3. Pose control — OpenPose ControlNet.
4. Inpainting / outpainting — mask + ControlNet.
## 💻 패턴
### Prompt weighting (compel / A1111 syntax)
```
# 매 increase weight
(beautiful:1.3) sunset, (highly detailed:1.5)
# 매 decrease weight
[blurry:0.7] background
# 매 nested
((cinematic lighting):1.2) photo of a [(crowd):0.8]
```
### Diffusers + LoRA + ControlNet (2026)
```python
from diffusers import FluxPipeline, FluxControlNetModel
from diffusers.utils import load_image
import torch
# 매 FLUX.1-dev + ControlNet
controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Union",
torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
controlnet=controlnet,
torch_dtype=torch.bfloat16,
).to("cuda")
# 매 LoRA stacking
pipe.load_lora_weights("./loras/anime_style.safetensors", adapter_name="style")
pipe.load_lora_weights("./loras/character.safetensors", adapter_name="char")
pipe.set_adapters(["style", "char"], adapter_weights=[0.7, 0.9])
control_image = load_image("./pose.png")
image = pipe(
prompt="a knight in shining armor, cinematic lighting",
control_image=control_image,
controlnet_conditioning_scale=0.6,
guidance_scale=3.5,
num_inference_steps=28,
).images[0]
```
### IP-Adapter for face consistency
```python
from diffusers import StableDiffusionXLPipeline
from transformers import CLIPVisionModelWithProjection
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
pipe.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name="ip-adapter-plus-face_sdxl_vit-h.safetensors",
)
pipe.set_ip_adapter_scale(0.7) # 매 face strength
face_image = load_image("./reference_face.jpg")
result = pipe(
prompt="cyberpunk warrior in neon city",
ip_adapter_image=face_image,
num_inference_steps=30,
).images[0]
```
### LoRA training (PEFT, rank-16)
```python
from peft import LoraConfig
from diffusers import StableDiffusionXLPipeline
lora_config = LoraConfig(
r=16,
lora_alpha=16,
target_modules=["to_q", "to_k", "to_v", "to_out.0"],
init_lora_weights="gaussian",
)
pipe.unet.add_adapter(lora_config)
# 매 의 train — 매 1000-3000 step 의 의 sufficient for character LoRA
```
### ComfyUI workflow (JSON)
```json
{
"nodes": [
{"id": 1, "type": "CheckpointLoader", "model": "flux1-dev.safetensors"},
{"id": 2, "type": "LoraLoader", "lora": "char.safetensors", "strength": 0.9, "input": 1},
{"id": 3, "type": "ControlNetLoader", "model": "flux-controlnet-union.safetensors"},
{"id": 4, "type": "OpenPosePreprocessor", "image": "pose.png"},
{"id": 5, "type": "KSampler", "steps": 28, "cfg": 3.5, "sampler": "euler"}
]
}
```
### Regional prompting (mask-based)
```python
# 매 left half: portrait, right half: landscape
from diffusers_regional import RegionalPipeline
pipe = RegionalPipeline.from_pretrained("stabilityai/sdxl")
masks = [
{"mask": left_mask, "prompt": "portrait of a woman, oil painting"},
{"mask": right_mask, "prompt": "mountain landscape, sunset"},
]
image = pipe(masks=masks, base_prompt="cinematic, detailed").images[0]
```
### CFG scale tuning
```python
# 매 FLUX dev: 의 distilled — guidance_scale 3-5 의 의
# 매 SDXL: 6-9 의 의
# 매 too high → oversaturated, baked-in
# 매 too low → ignores prompt
for cfg in [2.0, 3.5, 5.0, 7.5, 10.0]:
img = pipe(prompt=p, guidance_scale=cfg).images[0]
img.save(f"cfg_{cfg}.png")
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Character consistency | IP-Adapter face + character LoRA |
| Pose / composition control | ControlNet (OpenPose, Depth, Canny) |
| Style transfer | Style LoRA OR IP-Adapter style |
| Fine detail emphasis | Prompt weight `(token:1.3)` |
| Production pipeline | ComfyUI graph (versionable, reproducible) |
| Quick iteration | Diffusers Python API |
**기본값**: FLUX.1-dev + ControlNet Union + LoRA (style+char) + IP-Adapter face — 의 ComfyUI workflow 의 의.
## 🔗 Graph
- 부모: [[Diffusion_Models]] · [[Stable_Diffusion]]
- 변형: [[FLUX_1]] · [[SDXL]] · [[SD3_5]]
- 응용: [[ControlNet]] · [[IP_Adapter]] · [[LoRA_Training]]
- Adjacent: [[ComfyUI]] · [[Prompt_Engineering]] · [[Image_Generation_Workflow]]
## 🤖 LLM 활용
**언제**: prompt scaffolding, ComfyUI node 의 의 explanation, LoRA training script generation.
**언제 X**: 의 visual quality judgement (의 human eval 의 의), 의 specific LoRA recommendation (의 CivitAI 의 의 평가 의).
## ❌ 안티패턴
- **Over-weighted token (`(x:2.0)`)**: 매 attention collapse — artifact 의.
- **Too many LoRA stacked**: 매 weight saturation, 의 ugly mess (4+ 의 의).
- **High CFG on FLUX**: distilled model 의 의 의 — 의 SDXL recipe 의 의.
- **ControlNet at 1.0**: 의 strict — 의 0.4-0.7 의 의.
- **Negative prompt on FLUX dev**: 의 의 의 — 의 distilled 의 의.
## 🧪 검증 / 중복
- Verified (Diffusers docs, ComfyUI repo, Black Forest Labs FLUX paper, Stability AI release notes).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — SD weight + control system 의 의 |