Files
2nd/10_Wiki/Topics/AI_and_ML/초상화 및 애니메이션 스타일 제어.md
T
2026-05-10 22:08:15 +09:00

206 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-초상화-및-애니메이션-스타일-제어
title: 초상화 및 애니메이션 스타일 제어
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Portrait Style Control, Animation Style Control, Identity-Preserving Generation]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [diffusion, portrait, animation, identity, style-transfer]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: PyTorch/diffusers
---
# 초상화 및 애니메이션 스타일 제어
## 매 한 줄
> **"매 identity 보존 + 매 style 변환의 직교 분리"**. portrait/animation 도메인의 매 핵심 challenge — 같은 사람이 매 frame 마다 같아야 하고 (identity), 매 style/pose 는 자유롭게 (control). 2026 의 매 답: InstantID / PuLID (identity) + IP-Adapter (style) + ControlNet pose (motion) 의 stack.
## 매 핵심
### 매 3축 분리
- **Identity axis**: 매 face embedding (ArcFace) 으로 lock — InstantID, PuLID
- **Style axis**: 매 reference image embedding 으로 modulate — IP-Adapter
- **Motion axis**: 매 pose / depth 로 frame structure — OpenPose / DWPose
### 매 animation consistency 기법
- **Reference frame**: 매 첫 frame 을 anchor 로 IP-Adapter 적용
- **Temporal LoRA**: 매 AnimateDiff motion module 로 inter-frame coherence
- **Latent warp**: 매 prev frame latent 을 optical flow 로 warp 후 noise add
- **Cross-frame attention**: 매 frame 의 attention key/value 를 공유
### 매 응용
1. Avatar / VTuber pipeline — 매 same face × multi-emotion × multi-outfit.
2. Character sheet generation — 매 turnaround (front/side/back).
3. Short animation — 매 character 의 8-frame walk cycle.
## 💻 패턴
### InstantID portrait generation
```python
from diffusers import StableDiffusionXLInstantIDPipeline
from insightface.app import FaceAnalysis
import cv2, numpy as np
face_app = FaceAnalysis(name="antelopev2", providers=["CUDAExecutionProvider"])
face_app.prepare(ctx_id=0, det_size=(640, 640))
face_img = cv2.imread("ref.jpg")
face_info = face_app.get(face_img)[0]
face_emb = face_info["embedding"]
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=instantid_controlnet,
torch_dtype=torch.float16,
).to("cuda")
pipe.load_ip_adapter_instantid("instantid_ip-adapter.bin")
out = pipe(
prompt="anime portrait, school uniform",
image_embeds=face_emb,
image=face_kps, # face keypoints
ip_adapter_scale=0.8,
controlnet_conditioning_scale=0.8,
num_inference_steps=30,
).images[0]
```
### PuLID identity preservation
```python
from pulid.pipeline_v1_1 import PuLIDPipeline
pulid = PuLIDPipeline()
id_emb = pulid.get_id_embedding(["ref1.jpg", "ref2.jpg"])
img = pulid.inference(
prompt="cyberpunk character, neon city",
id_embedding=id_emb,
id_scale=0.9,
cfg_scale=1.2,
steps=4, # SDXL Lightning
)[0]
```
### IP-Adapter style + face combined
```python
pipe.load_ip_adapter(
"h94/IP-Adapter",
subfolder="sdxl_models",
weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors",
"ip-adapter-plus_sdxl_vit-h.safetensors"],
)
pipe.set_ip_adapter_scale([0.7, 0.4]) # face stronger than style
img = pipe(
prompt="portrait, watercolor style",
ip_adapter_image=[face_ref, style_ref],
).images[0]
```
### AnimateDiff motion generation
```python
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3")
pipe = AnimateDiffPipeline.from_pretrained(
"SG161222/Realistic_Vision_V5.1_noVAE",
motion_adapter=adapter,
torch_dtype=torch.float16,
).to("cuda")
frames = pipe(
prompt="character walking, side view",
num_frames=16,
num_inference_steps=25,
guidance_scale=7.5,
).frames[0]
```
### Cross-frame attention sharing
```python
def cross_frame_attention(self, x, prev_kv=None):
q = self.to_q(x)
k, v = self.to_k(x), self.to_v(x)
if prev_kv is not None:
# 매 prev frame 의 key/value 를 concat
k = torch.cat([prev_kv["k"], k], dim=1)
v = torch.cat([prev_kv["v"], v], dim=1)
out = scaled_dot_product_attention(q, k, v)
return self.to_out(out), {"k": k, "v": v}
```
### Turnaround sheet (multi-pose)
```python
poses = ["front view", "3/4 view", "side view", "back view"]
turnaround = []
for pose in poses:
img = pipe(
prompt=f"character portrait, {pose}, neutral expression",
image_embeds=face_emb,
image=pose_skeleton[pose],
controlnet_conditioning_scale=0.9,
generator=torch.Generator("cuda").manual_seed(42), # 매 fixed seed
).images[0]
turnaround.append(img)
```
### Emotion variation with locked identity
```python
emotions = ["smiling", "angry", "surprised", "sad", "neutral"]
for emo in emotions:
img = pipe(
prompt=f"portrait, {emo} expression",
image_embeds=face_emb,
ip_adapter_scale=0.85, # 매 identity strong
guidance_scale=4.5,
generator=torch.Generator("cuda").manual_seed(7),
).images[0]
img.save(f"emo_{emo}.png")
```
## 매 결정 기준
| 목표 | 조합 |
|---|---|
| Highest face fidelity | PuLID + InstantID + IP-Adapter Face |
| Style transfer with face | IP-Adapter Face (0.8) + IP-Adapter Style (0.4) |
| Animation, single character | AnimateDiff + reference attention + IP-Adapter |
| Game character sheet | InstantID + ControlNet pose × 4 with shared seed |
| Real-time avatar | SDXL Lightning / FLUX Schnell + cached identity emb |
**기본값**: InstantID + IP-Adapter (style 0.4, face 0.7) + 매 fixed seed for batch.
## 🔗 Graph
- 부모: [[이미지 생성 및 제어 파이프라인]] · [[AI 이미지 생성 (AI Image Generation)]]
- 변형: [[ComfyUI]] · [[InstantID]] · [[PuLID]]
- 응용: [[Avatar_Pipeline]] · [[AI 모델 사후 편집 도구 (Post-editing Tools)]]
- Adjacent: [[AnimateDiff]] · [[IP-Adapter]] · [[ControlNet]]
## 🤖 LLM 활용
**언제**: prompt 의 emotion / pose 변형 generation, character sheet plan 작성, style description 추출.
**언제 X**: face embedding 의 inner space — geometric, LLM 의 X.
## ❌ 안티패턴
- **No fixed seed in batch**: 매 turnaround 마다 face drift.
- **IP-Adapter scale > 1.0**: 매 prompt 무시, reference 의 over-copy.
- **Identity + Style conflict**: 매 같은 weight → identity blur.
- **Missing pose normalization**: pose skeleton 의 scale 이 prompt 와 불일치.
- **AnimateDiff w/o reference**: 매 frame consistency 없는 flicker.
## 🧪 검증 / 중복
- Verified (InstantX InstantID paper 2024, PuLID v1.1 release notes 2025, AnimateDiff v3).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — portrait/animation identity+style control |