f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
223 lines
7.4 KiB
Markdown
223 lines
7.4 KiB
Markdown
---
|
|
id: wiki-2026-0508-일관된-캐릭터-및-스타일-구축
|
|
title: 일관된 캐릭터 및 스타일 구축
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Consistent Character, Brand Consistency Maintenance, Character Sheet, Style Lock]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [ai, image-generation, character-consistency, lora, style]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: diffusers-flux
|
|
---
|
|
|
|
# 일관된 캐릭터 및 스타일 구축
|
|
|
|
## 매 한 줄
|
|
> **"매 character/style consistency 는 single shot 의 prompt 로 안 되며, multi-stack (LoRA + IP-Adapter + ControlNet + reference latents) 의 합으로만 stable 해진다"**. 2026 의 production character pipeline 은 character sheet → multi-view dataset → subject LoRA → generation-time stack → CLIP/face-similarity validation 의 매 5-step loop 로 운영됨. seed lock 만으로는 매 부족.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 consistency 의 4 차원
|
|
- **Identity**: 얼굴, 체형, 비율 (face/body).
|
|
- **Outfit**: 의상 details, color, accessory.
|
|
- **Style**: rendering, palette, line/shading.
|
|
- **Pose/Expression**: 매 controllable variation.
|
|
|
|
### 매 stack (2026 best)
|
|
- **Subject LoRA**: 30-50 ref images, identity lock.
|
|
- **Style LoRA**: separate, 매 stack 가능.
|
|
- **IP-Adapter Face / FaceID**: face embedding.
|
|
- **PuLID / Photomaker**: zero-shot face injection.
|
|
- **InstantID**: identity + pose ControlNet.
|
|
- **Reference-only ControlNet**: latent reference.
|
|
|
|
### 매 응용
|
|
1. Webtoon / illustrated novel 의 character series.
|
|
2. Brand mascot 의 cross-channel reuse.
|
|
3. Game NPC 의 procedural variation with identity.
|
|
|
|
## 💻 패턴
|
|
|
|
### Character sheet (training data)
|
|
```
|
|
data/hero/
|
|
├─ 01_front_neutral.png "<hero> front view, neutral expression"
|
|
├─ 02_side_neutral.png "<hero> side profile, neutral"
|
|
├─ 03_back.png "<hero> back view"
|
|
├─ 04_3q_smile.png "<hero> 3/4 view, smiling"
|
|
├─ 05_close_face.png "<hero> close-up portrait"
|
|
├─ 06_full_body.png "<hero> full body, T-pose"
|
|
├─ 07_action_run.png "<hero> running pose"
|
|
...
|
|
30+ images, varied pose/expression, consistent outfit
|
|
```
|
|
|
|
### Subject LoRA + caption
|
|
```python
|
|
# Captions emphasize TRIGGER + variation, NOT outfit (so it's learned implicitly)
|
|
captions = [
|
|
"<hero01>, front view, neutral expression",
|
|
"<hero01>, side profile",
|
|
"<hero01>, smiling, 3/4 view",
|
|
"<hero01>, in forest, full body",
|
|
]
|
|
# Train LoRA rank 32, 2000 steps, lr 1e-4, FLUX.1-dev
|
|
```
|
|
|
|
### Multi-LoRA at inference
|
|
```python
|
|
from diffusers import FluxPipeline
|
|
import torch
|
|
|
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
|
|
torch_dtype=torch.bfloat16).to("cuda")
|
|
pipe.load_lora_weights("./loras/hero01.safetensors", adapter_name="char")
|
|
pipe.load_lora_weights("./loras/brand_style.safetensors", adapter_name="style")
|
|
pipe.set_adapters(["char","style"], adapter_weights=[0.95, 0.7])
|
|
|
|
img = pipe(
|
|
"<hero01>, drinking coffee in cafe, brand_style",
|
|
num_inference_steps=28, guidance_scale=3.5,
|
|
generator=torch.Generator("cuda").manual_seed(42)
|
|
).images[0]
|
|
```
|
|
|
|
### PuLID (zero-shot face lock)
|
|
```python
|
|
from pulid import PuLIDPipeline
|
|
pl = PuLIDPipeline.from_pretrained("ByteDance/PuLID-FLUX")
|
|
|
|
img = pl.generate(
|
|
prompt="<hero> hiking on mountain, golden hour",
|
|
id_image="hero_face_ref.png",
|
|
id_weight=0.85,
|
|
seed=42, steps=20
|
|
)
|
|
# No training, single ref → identity preserved
|
|
```
|
|
|
|
### InstantID (face + pose)
|
|
```python
|
|
from diffusers import StableDiffusionXLInstantIDPipeline, ControlNetModel
|
|
controlnet = ControlNetModel.from_pretrained("InstantX/InstantID")
|
|
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet
|
|
).to("cuda")
|
|
pipe.load_ip_adapter_instantid("InstantX/InstantID")
|
|
|
|
face_emb = extract_face_embedding(ref_img)
|
|
img = pipe(
|
|
prompt="<hero> samurai in feudal japan",
|
|
image_embeds=face_emb,
|
|
image=pose_kps_img, # OpenPose keypoints
|
|
controlnet_conditioning_scale=0.8,
|
|
ip_adapter_scale=0.8,
|
|
).images[0]
|
|
```
|
|
|
|
### Reference image guidance (IP-Adapter)
|
|
```python
|
|
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models",
|
|
weight_name="ip-adapter_sdxl.bin")
|
|
pipe.set_ip_adapter_scale(0.5)
|
|
|
|
img = pipe(
|
|
prompt="<hero> sci-fi armor, cyberpunk city",
|
|
ip_adapter_image=hero_reference_img,
|
|
).images[0]
|
|
```
|
|
|
|
### Validation: face similarity
|
|
```python
|
|
from insightface.app import FaceAnalysis
|
|
import numpy as np
|
|
|
|
face = FaceAnalysis(name="buffalo_l"); face.prepare(ctx_id=0)
|
|
ref_emb = face.get(ref_img)[0].embedding
|
|
gen_emb = face.get(generated_img)[0].embedding
|
|
|
|
cos_sim = np.dot(ref_emb, gen_emb) / (np.linalg.norm(ref_emb)*np.linalg.norm(gen_emb))
|
|
assert cos_sim > 0.55, f"identity drift: {cos_sim:.3f}"
|
|
```
|
|
|
|
### Outfit consistency check (CLIP)
|
|
```python
|
|
import open_clip
|
|
model, _, prep = open_clip.create_model_and_transforms("ViT-bigG-14")
|
|
outfit_prompt = "white hoodie, black cargo pants, red sneakers"
|
|
txt_emb = model.encode_text(open_clip.tokenize([outfit_prompt]))
|
|
img_emb = model.encode_image(prep(generated).unsqueeze(0))
|
|
score = torch.cosine_similarity(txt_emb, img_emb)
|
|
# alert if score < 0.27
|
|
```
|
|
|
|
### Generation-loop with retry
|
|
```python
|
|
def generate_consistent(prompt, max_retry=4):
|
|
for i in range(max_retry):
|
|
seed = 1000 + i*7
|
|
img = pipe(prompt, generator=torch.Generator("cuda").manual_seed(seed)).images[0]
|
|
sim = face_sim(img, ref_img)
|
|
if sim > 0.55: return img, sim, seed
|
|
raise RuntimeError("identity could not be preserved")
|
|
```
|
|
|
|
### Style transfer for series
|
|
```python
|
|
# Step 1: generate composition with character locked
|
|
base = pipe("<hero> sitting on bench", lora=char_lora).images[0]
|
|
|
|
# Step 2: img2img with style LoRA
|
|
styled = i2i_pipe(
|
|
prompt="<hero> sitting on bench, brand_style",
|
|
image=base, strength=0.4,
|
|
lora_stack=[char_lora, style_lora],
|
|
).images[0]
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| 30+ ref available | train subject LoRA |
|
|
| 1-3 ref only | PuLID / Photomaker |
|
|
| identity + exact pose | InstantID |
|
|
| brand style 분리 | separate Style LoRA |
|
|
| series of frames | seed lock + same LoRA stack |
|
|
| validation gate | InsightFace cos > 0.55 |
|
|
|
|
**기본값**: subject LoRA + style LoRA + IP-Adapter face + face-sim CI.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[AI 이미지 생성 (AI Image Generation)]]
|
|
- 변형: [[LoRA Fine-tuning]] · [[InstantID]]
|
|
- 응용: [[인공지능 시각 언어 생성 (AI Visual Language Generation)]] · [[오픈소스 이미지 모델 미세 조정 및 배포]]
|
|
- Adjacent: [[IP-Adapter]] · [[ControlNet]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: caption authoring for char dataset, prompt variation list, validation rubric.
|
|
**언제 X**: face similarity scoring — deterministic insightface 가 정답.
|
|
|
|
## ❌ 안티패턴
|
|
- **Single ref overfit**: 1 image LoRA → mode collapse.
|
|
- **Mixing identities in dataset**: 매 LoRA confused.
|
|
- **Caption with outfit details**: outfit 이 trigger 와 분리 안 됨 → 매 outfit 변경 어려움.
|
|
- **No validation**: drift 누적 unnoticed.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (PuLID paper 2024, InstantID Tencent 2024, IP-Adapter Tencent 2023, diffusers docs).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — character/style consistency multi-stack pipeline. |
|