206 lines
6.9 KiB
Markdown
206 lines
6.9 KiB
Markdown
---
|
||
id: wiki-2026-0508-초상화-및-애니메이션-스타일-제어
|
||
title: 초상화 및 애니메이션 스타일 제어
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Portrait Style Control, Animation Style Control, Identity-Preserving Generation]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [diffusion, portrait, animation, identity, style-transfer]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: PyTorch/diffusers
|
||
---
|
||
|
||
# 초상화 및 애니메이션 스타일 제어
|
||
|
||
## 매 한 줄
|
||
> **"매 identity 보존 + 매 style 변환의 직교 분리"**. portrait/animation 도메인의 매 핵심 challenge — 같은 사람이 매 frame 마다 같아야 하고 (identity), 매 style/pose 는 자유롭게 (control). 2026 의 매 답: InstantID / PuLID (identity) + IP-Adapter (style) + ControlNet pose (motion) 의 stack.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 3축 분리
|
||
- **Identity axis**: 매 face embedding (ArcFace) 으로 lock — InstantID, PuLID
|
||
- **Style axis**: 매 reference image embedding 으로 modulate — IP-Adapter
|
||
- **Motion axis**: 매 pose / depth 로 frame structure — OpenPose / DWPose
|
||
|
||
### 매 animation consistency 기법
|
||
- **Reference frame**: 매 첫 frame 을 anchor 로 IP-Adapter 적용
|
||
- **Temporal LoRA**: 매 AnimateDiff motion module 로 inter-frame coherence
|
||
- **Latent warp**: 매 prev frame latent 을 optical flow 로 warp 후 noise add
|
||
- **Cross-frame attention**: 매 frame 의 attention key/value 를 공유
|
||
|
||
### 매 응용
|
||
1. Avatar / VTuber pipeline — 매 same face × multi-emotion × multi-outfit.
|
||
2. Character sheet generation — 매 turnaround (front/side/back).
|
||
3. Short animation — 매 character 의 8-frame walk cycle.
|
||
|
||
## 💻 패턴
|
||
|
||
### InstantID portrait generation
|
||
```python
|
||
from diffusers import StableDiffusionXLInstantIDPipeline
|
||
from insightface.app import FaceAnalysis
|
||
import cv2, numpy as np
|
||
|
||
face_app = FaceAnalysis(name="antelopev2", providers=["CUDAExecutionProvider"])
|
||
face_app.prepare(ctx_id=0, det_size=(640, 640))
|
||
|
||
face_img = cv2.imread("ref.jpg")
|
||
face_info = face_app.get(face_img)[0]
|
||
face_emb = face_info["embedding"]
|
||
|
||
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
|
||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||
controlnet=instantid_controlnet,
|
||
torch_dtype=torch.float16,
|
||
).to("cuda")
|
||
pipe.load_ip_adapter_instantid("instantid_ip-adapter.bin")
|
||
|
||
out = pipe(
|
||
prompt="anime portrait, school uniform",
|
||
image_embeds=face_emb,
|
||
image=face_kps, # face keypoints
|
||
ip_adapter_scale=0.8,
|
||
controlnet_conditioning_scale=0.8,
|
||
num_inference_steps=30,
|
||
).images[0]
|
||
```
|
||
|
||
### PuLID identity preservation
|
||
```python
|
||
from pulid.pipeline_v1_1 import PuLIDPipeline
|
||
|
||
pulid = PuLIDPipeline()
|
||
id_emb = pulid.get_id_embedding(["ref1.jpg", "ref2.jpg"])
|
||
|
||
img = pulid.inference(
|
||
prompt="cyberpunk character, neon city",
|
||
id_embedding=id_emb,
|
||
id_scale=0.9,
|
||
cfg_scale=1.2,
|
||
steps=4, # SDXL Lightning
|
||
)[0]
|
||
```
|
||
|
||
### IP-Adapter style + face combined
|
||
```python
|
||
pipe.load_ip_adapter(
|
||
"h94/IP-Adapter",
|
||
subfolder="sdxl_models",
|
||
weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors",
|
||
"ip-adapter-plus_sdxl_vit-h.safetensors"],
|
||
)
|
||
pipe.set_ip_adapter_scale([0.7, 0.4]) # face stronger than style
|
||
|
||
img = pipe(
|
||
prompt="portrait, watercolor style",
|
||
ip_adapter_image=[face_ref, style_ref],
|
||
).images[0]
|
||
```
|
||
|
||
### AnimateDiff motion generation
|
||
```python
|
||
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
|
||
|
||
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3")
|
||
pipe = AnimateDiffPipeline.from_pretrained(
|
||
"SG161222/Realistic_Vision_V5.1_noVAE",
|
||
motion_adapter=adapter,
|
||
torch_dtype=torch.float16,
|
||
).to("cuda")
|
||
|
||
frames = pipe(
|
||
prompt="character walking, side view",
|
||
num_frames=16,
|
||
num_inference_steps=25,
|
||
guidance_scale=7.5,
|
||
).frames[0]
|
||
```
|
||
|
||
### Cross-frame attention sharing
|
||
```python
|
||
def cross_frame_attention(self, x, prev_kv=None):
|
||
q = self.to_q(x)
|
||
k, v = self.to_k(x), self.to_v(x)
|
||
if prev_kv is not None:
|
||
# 매 prev frame 의 key/value 를 concat
|
||
k = torch.cat([prev_kv["k"], k], dim=1)
|
||
v = torch.cat([prev_kv["v"], v], dim=1)
|
||
out = scaled_dot_product_attention(q, k, v)
|
||
return self.to_out(out), {"k": k, "v": v}
|
||
```
|
||
|
||
### Turnaround sheet (multi-pose)
|
||
```python
|
||
poses = ["front view", "3/4 view", "side view", "back view"]
|
||
turnaround = []
|
||
for pose in poses:
|
||
img = pipe(
|
||
prompt=f"character portrait, {pose}, neutral expression",
|
||
image_embeds=face_emb,
|
||
image=pose_skeleton[pose],
|
||
controlnet_conditioning_scale=0.9,
|
||
generator=torch.Generator("cuda").manual_seed(42), # 매 fixed seed
|
||
).images[0]
|
||
turnaround.append(img)
|
||
```
|
||
|
||
### Emotion variation with locked identity
|
||
```python
|
||
emotions = ["smiling", "angry", "surprised", "sad", "neutral"]
|
||
for emo in emotions:
|
||
img = pipe(
|
||
prompt=f"portrait, {emo} expression",
|
||
image_embeds=face_emb,
|
||
ip_adapter_scale=0.85, # 매 identity strong
|
||
guidance_scale=4.5,
|
||
generator=torch.Generator("cuda").manual_seed(7),
|
||
).images[0]
|
||
img.save(f"emo_{emo}.png")
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 목표 | 조합 |
|
||
|---|---|
|
||
| Highest face fidelity | PuLID + InstantID + IP-Adapter Face |
|
||
| Style transfer with face | IP-Adapter Face (0.8) + IP-Adapter Style (0.4) |
|
||
| Animation, single character | AnimateDiff + reference attention + IP-Adapter |
|
||
| Game character sheet | InstantID + ControlNet pose × 4 with shared seed |
|
||
| Real-time avatar | SDXL Lightning / FLUX Schnell + cached identity emb |
|
||
|
||
**기본값**: InstantID + IP-Adapter (style 0.4, face 0.7) + 매 fixed seed for batch.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[이미지 생성 및 제어 파이프라인]] · [[AI 이미지 생성 (AI Image Generation)]]
|
||
- 변형: [[ComfyUI]] · [[InstantID]] · [[PuLID]]
|
||
- 응용: [[Avatar_Pipeline]] · [[AI 모델 사후 편집 도구 (Post-editing Tools)]]
|
||
- Adjacent: [[AnimateDiff]] · [[IP-Adapter]] · [[ControlNet]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: prompt 의 emotion / pose 변형 generation, character sheet plan 작성, style description 추출.
|
||
**언제 X**: face embedding 의 inner space — geometric, LLM 의 X.
|
||
|
||
## ❌ 안티패턴
|
||
- **No fixed seed in batch**: 매 turnaround 마다 face drift.
|
||
- **IP-Adapter scale > 1.0**: 매 prompt 무시, reference 의 over-copy.
|
||
- **Identity + Style conflict**: 매 같은 weight → identity blur.
|
||
- **Missing pose normalization**: pose skeleton 의 scale 이 prompt 와 불일치.
|
||
- **AnimateDiff w/o reference**: 매 frame consistency 없는 flicker.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (InstantX InstantID paper 2024, PuLID v1.1 release notes 2025, AnimateDiff v3).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — portrait/animation identity+style control |
|