d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.4 KiB
7.4 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-일관된-캐릭터-및-스타일-구축 | 일관된 캐릭터 및 스타일 구축 | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
일관된 캐릭터 및 스타일 구축
매 한 줄
"매 character/style consistency 는 single shot 의 prompt 로 안 되며, multi-stack (LoRA + IP-Adapter + ControlNet + reference latents) 의 합으로만 stable 해진다". 2026 의 production character pipeline 은 character sheet → multi-view dataset → subject LoRA → generation-time stack → CLIP/face-similarity validation 의 매 5-step loop 로 운영됨. seed lock 만으로는 매 부족.
매 핵심
매 consistency 의 4 차원
- Identity: 얼굴, 체형, 비율 (face/body).
- Outfit: 의상 details, color, accessory.
- Style: rendering, palette, line/shading.
- Pose/Expression: 매 controllable variation.
매 stack (2026 best)
- Subject LoRA: 30-50 ref images, identity lock.
- Style LoRA: separate, 매 stack 가능.
- IP-Adapter Face / FaceID: face embedding.
- PuLID / Photomaker: zero-shot face injection.
- InstantID: identity + pose ControlNet.
- Reference-only ControlNet: latent reference.
매 응용
- Webtoon / illustrated novel 의 character series.
- Brand mascot 의 cross-channel reuse.
- Game NPC 의 procedural variation with identity.
💻 패턴
Character sheet (training data)
data/hero/
├─ 01_front_neutral.png "<hero> front view, neutral expression"
├─ 02_side_neutral.png "<hero> side profile, neutral"
├─ 03_back.png "<hero> back view"
├─ 04_3q_smile.png "<hero> 3/4 view, smiling"
├─ 05_close_face.png "<hero> close-up portrait"
├─ 06_full_body.png "<hero> full body, T-pose"
├─ 07_action_run.png "<hero> running pose"
...
30+ images, varied pose/expression, consistent outfit
Subject LoRA + caption
# Captions emphasize TRIGGER + variation, NOT outfit (so it's learned implicitly)
captions = [
"<hero01>, front view, neutral expression",
"<hero01>, side profile",
"<hero01>, smiling, 3/4 view",
"<hero01>, in forest, full body",
]
# Train LoRA rank 32, 2000 steps, lr 1e-4, FLUX.1-dev
Multi-LoRA at inference
from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("./loras/hero01.safetensors", adapter_name="char")
pipe.load_lora_weights("./loras/brand_style.safetensors", adapter_name="style")
pipe.set_adapters(["char","style"], adapter_weights=[0.95, 0.7])
img = pipe(
"<hero01>, drinking coffee in cafe, brand_style",
num_inference_steps=28, guidance_scale=3.5,
generator=torch.Generator("cuda").manual_seed(42)
).images[0]
PuLID (zero-shot face lock)
from pulid import PuLIDPipeline
pl = PuLIDPipeline.from_pretrained("ByteDance/PuLID-FLUX")
img = pl.generate(
prompt="<hero> hiking on mountain, golden hour",
id_image="hero_face_ref.png",
id_weight=0.85,
seed=42, steps=20
)
# No training, single ref → identity preserved
InstantID (face + pose)
from diffusers import StableDiffusionXLInstantIDPipeline, ControlNetModel
controlnet = ControlNetModel.from_pretrained("InstantX/InstantID")
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet
).to("cuda")
pipe.load_ip_adapter_instantid("InstantX/InstantID")
face_emb = extract_face_embedding(ref_img)
img = pipe(
prompt="<hero> samurai in feudal japan",
image_embeds=face_emb,
image=pose_kps_img, # OpenPose keypoints
controlnet_conditioning_scale=0.8,
ip_adapter_scale=0.8,
).images[0]
Reference image guidance (IP-Adapter)
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models",
weight_name="ip-adapter_sdxl.bin")
pipe.set_ip_adapter_scale(0.5)
img = pipe(
prompt="<hero> sci-fi armor, cyberpunk city",
ip_adapter_image=hero_reference_img,
).images[0]
Validation: face similarity
from insightface.app import FaceAnalysis
import numpy as np
face = FaceAnalysis(name="buffalo_l"); face.prepare(ctx_id=0)
ref_emb = face.get(ref_img)[0].embedding
gen_emb = face.get(generated_img)[0].embedding
cos_sim = np.dot(ref_emb, gen_emb) / (np.linalg.norm(ref_emb)*np.linalg.norm(gen_emb))
assert cos_sim > 0.55, f"identity drift: {cos_sim:.3f}"
Outfit consistency check (CLIP)
import open_clip
model, _, prep = open_clip.create_model_and_transforms("ViT-bigG-14")
outfit_prompt = "white hoodie, black cargo pants, red sneakers"
txt_emb = model.encode_text(open_clip.tokenize([outfit_prompt]))
img_emb = model.encode_image(prep(generated).unsqueeze(0))
score = torch.cosine_similarity(txt_emb, img_emb)
# alert if score < 0.27
Generation-loop with retry
def generate_consistent(prompt, max_retry=4):
for i in range(max_retry):
seed = 1000 + i*7
img = pipe(prompt, generator=torch.Generator("cuda").manual_seed(seed)).images[0]
sim = face_sim(img, ref_img)
if sim > 0.55: return img, sim, seed
raise RuntimeError("identity could not be preserved")
Style transfer for series
# Step 1: generate composition with character locked
base = pipe("<hero> sitting on bench", lora=char_lora).images[0]
# Step 2: img2img with style LoRA
styled = i2i_pipe(
prompt="<hero> sitting on bench, brand_style",
image=base, strength=0.4,
lora_stack=[char_lora, style_lora],
).images[0]
매 결정 기준
| 상황 | Approach |
|---|---|
| 30+ ref available | train subject LoRA |
| 1-3 ref only | PuLID / Photomaker |
| identity + exact pose | InstantID |
| brand style 분리 | separate Style LoRA |
| series of frames | seed lock + same LoRA stack |
| validation gate | InsightFace cos > 0.55 |
기본값: subject LoRA + style LoRA + IP-Adapter face + face-sim CI.
🔗 Graph
- 부모: AI Image Generation
- 변형: LoRA Fine-tuning · InstantID
- 응용: 인공지능 시각 언어 생성 (AI Visual Language Generation) · 오픈소스 이미지 모델 미세 조정 및 배포
- Adjacent: IP-Adapter · ControlNet
🤖 LLM 활용
언제: caption authoring for char dataset, prompt variation list, validation rubric. 언제 X: face similarity scoring — deterministic insightface 가 정답.
❌ 안티패턴
- Single ref overfit: 1 image LoRA → mode collapse.
- Mixing identities in dataset: 매 LoRA confused.
- Caption with outfit details: outfit 이 trigger 와 분리 안 됨 → 매 outfit 변경 어려움.
- No validation: drift 누적 unnoticed.
🧪 검증 / 중복
- Verified (PuLID paper 2024, InstantID Tencent 2024, IP-Adapter Tencent 2023, diffusers docs).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — character/style consistency multi-stack pipeline. |