--- id: wiki-2026-0508-일관된-캐릭터-및-스타일-구축 title: 일관된 캐릭터 및 스타일 구축 category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Consistent Character, Brand Consistency Maintenance, Character Sheet, Style Lock] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [ai, image-generation, character-consistency, lora, style] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: diffusers-flux --- # 일관된 캐릭터 및 스타일 구축 ## 매 한 줄 > **"매 character/style consistency 는 single shot 의 prompt 로 안 되며, multi-stack (LoRA + IP-Adapter + ControlNet + reference latents) 의 합으로만 stable 해진다"**. 2026 의 production character pipeline 은 character sheet → multi-view dataset → subject LoRA → generation-time stack → CLIP/face-similarity validation 의 매 5-step loop 로 운영됨. seed lock 만으로는 매 부족. ## 매 핵심 ### 매 consistency 의 4 차원 - **Identity**: 얼굴, 체형, 비율 (face/body). - **Outfit**: 의상 details, color, accessory. - **Style**: rendering, palette, line/shading. - **Pose/Expression**: 매 controllable variation. ### 매 stack (2026 best) - **Subject LoRA**: 30-50 ref images, identity lock. - **Style LoRA**: separate, 매 stack 가능. - **IP-Adapter Face / FaceID**: face embedding. - **PuLID / Photomaker**: zero-shot face injection. - **InstantID**: identity + pose ControlNet. - **Reference-only ControlNet**: latent reference. ### 매 응용 1. Webtoon / illustrated novel 의 character series. 2. Brand mascot 의 cross-channel reuse. 3. Game NPC 의 procedural variation with identity. ## 💻 패턴 ### Character sheet (training data) ``` data/hero/ ├─ 01_front_neutral.png " front view, neutral expression" ├─ 02_side_neutral.png " side profile, neutral" ├─ 03_back.png " back view" ├─ 04_3q_smile.png " 3/4 view, smiling" ├─ 05_close_face.png " close-up portrait" ├─ 06_full_body.png " full body, T-pose" ├─ 07_action_run.png " running pose" ... 30+ images, varied pose/expression, consistent outfit ``` ### Subject LoRA + caption ```python # Captions emphasize TRIGGER + variation, NOT outfit (so it's learned implicitly) captions = [ ", front view, neutral expression", ", side profile", ", smiling, 3/4 view", ", in forest, full body", ] # Train LoRA rank 32, 2000 steps, lr 1e-4, FLUX.1-dev ``` ### Multi-LoRA at inference ```python from diffusers import FluxPipeline import torch pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda") pipe.load_lora_weights("./loras/hero01.safetensors", adapter_name="char") pipe.load_lora_weights("./loras/brand_style.safetensors", adapter_name="style") pipe.set_adapters(["char","style"], adapter_weights=[0.95, 0.7]) img = pipe( ", drinking coffee in cafe, brand_style", num_inference_steps=28, guidance_scale=3.5, generator=torch.Generator("cuda").manual_seed(42) ).images[0] ``` ### PuLID (zero-shot face lock) ```python from pulid import PuLIDPipeline pl = PuLIDPipeline.from_pretrained("ByteDance/PuLID-FLUX") img = pl.generate( prompt=" hiking on mountain, golden hour", id_image="hero_face_ref.png", id_weight=0.85, seed=42, steps=20 ) # No training, single ref → identity preserved ``` ### InstantID (face + pose) ```python from diffusers import StableDiffusionXLInstantIDPipeline, ControlNetModel controlnet = ControlNetModel.from_pretrained("InstantX/InstantID") pipe = StableDiffusionXLInstantIDPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet ).to("cuda") pipe.load_ip_adapter_instantid("InstantX/InstantID") face_emb = extract_face_embedding(ref_img) img = pipe( prompt=" samurai in feudal japan", image_embeds=face_emb, image=pose_kps_img, # OpenPose keypoints controlnet_conditioning_scale=0.8, ip_adapter_scale=0.8, ).images[0] ``` ### Reference image guidance (IP-Adapter) ```python pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin") pipe.set_ip_adapter_scale(0.5) img = pipe( prompt=" sci-fi armor, cyberpunk city", ip_adapter_image=hero_reference_img, ).images[0] ``` ### Validation: face similarity ```python from insightface.app import FaceAnalysis import numpy as np face = FaceAnalysis(name="buffalo_l"); face.prepare(ctx_id=0) ref_emb = face.get(ref_img)[0].embedding gen_emb = face.get(generated_img)[0].embedding cos_sim = np.dot(ref_emb, gen_emb) / (np.linalg.norm(ref_emb)*np.linalg.norm(gen_emb)) assert cos_sim > 0.55, f"identity drift: {cos_sim:.3f}" ``` ### Outfit consistency check (CLIP) ```python import open_clip model, _, prep = open_clip.create_model_and_transforms("ViT-bigG-14") outfit_prompt = "white hoodie, black cargo pants, red sneakers" txt_emb = model.encode_text(open_clip.tokenize([outfit_prompt])) img_emb = model.encode_image(prep(generated).unsqueeze(0)) score = torch.cosine_similarity(txt_emb, img_emb) # alert if score < 0.27 ``` ### Generation-loop with retry ```python def generate_consistent(prompt, max_retry=4): for i in range(max_retry): seed = 1000 + i*7 img = pipe(prompt, generator=torch.Generator("cuda").manual_seed(seed)).images[0] sim = face_sim(img, ref_img) if sim > 0.55: return img, sim, seed raise RuntimeError("identity could not be preserved") ``` ### Style transfer for series ```python # Step 1: generate composition with character locked base = pipe(" sitting on bench", lora=char_lora).images[0] # Step 2: img2img with style LoRA styled = i2i_pipe( prompt=" sitting on bench, brand_style", image=base, strength=0.4, lora_stack=[char_lora, style_lora], ).images[0] ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 30+ ref available | train subject LoRA | | 1-3 ref only | PuLID / Photomaker | | identity + exact pose | InstantID | | brand style 분리 | separate Style LoRA | | series of frames | seed lock + same LoRA stack | | validation gate | InsightFace cos > 0.55 | **기본값**: subject LoRA + style LoRA + IP-Adapter face + face-sim CI. ## 🔗 Graph - 부모: [[AI Image Generation]] · [[Character Design]] - 변형: [[LoRA Fine-tuning]] · [[InstantID]] · [[PuLID]] - 응용: [[인공지능 시각 언어 생성 (AI Visual Language Generation)]] · [[오픈소스 이미지 모델 미세 조정 및 배포]] - Adjacent: [[IP-Adapter]] · [[ControlNet]] · [[Brand Identity Generation]] ## 🤖 LLM 활용 **언제**: caption authoring for char dataset, prompt variation list, validation rubric. **언제 X**: face similarity scoring — deterministic insightface 가 정답. ## ❌ 안티패턴 - **Single ref overfit**: 1 image LoRA → mode collapse. - **Mixing identities in dataset**: 매 LoRA confused. - **Caption with outfit details**: outfit 이 trigger 와 분리 안 됨 → 매 outfit 변경 어려움. - **No validation**: drift 누적 unnoticed. ## 🧪 검증 / 중복 - Verified (PuLID paper 2024, InstantID Tencent 2024, IP-Adapter Tencent 2023, diffusers docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — character/style consistency multi-stack pipeline. |