--- id: wiki-2026-0508-style-transfer title: Style Transfer category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Neural Style Transfer, NST, Style Transfer in AI] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [style-transfer, neural-network, image-generation, diffusion] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: pytorch --- # Style Transfer ## 매 한 줄 > **"매 content + style separation의 art"**. Gatys et al. (2015) 가 VGG feature space 의 Gram matrix 로 style 추출 → 매 content image 에 transfer 의 seminal work. 2026 의 modern state 는 diffusion-based (IP-Adapter, ControlNet style) + Midjourney --sref 의 mainstream. ## 매 핵심 ### 매 origin (Gatys 2015) - VGG-19 의 conv layer activation 의 feature representation. - **Content loss**: 매 high-level layer (conv4_2) 의 feature MSE. - **Style loss**: 매 multiple layer 의 Gram matrix (feature correlation) MSE. - Optimization-based — 매 image pixel 자체 의 gradient descent (slow, ~minutes per image). ### 매 evolution - **Fast NST** (Johnson 2016): feedforward network 의 single forward pass. - **AdaIN** (Huang 2017): Adaptive Instance Normalization — 매 arbitrary style 의 real-time. - **Diffusion-based** (2023+): IP-Adapter, ControlNet — 매 prompt + reference 의 zero-shot. ### 매 응용 1. Artistic image generation (Prisma, DeepArt — 매 historical). 2. Midjourney --sref / --cref — 매 mainstream creative tool. 3. Video stylization (Runway, Kaiber). 4. Domain adaptation (synthetic → real). ## 💻 패턴 ### Gram matrix (style representation) ```python import torch import torch.nn as nn def gram_matrix(features): b, c, h, w = features.shape feat = features.view(b, c, h * w) gram = torch.bmm(feat, feat.transpose(1, 2)) return gram / (c * h * w) ``` ### AdaIN ```python def adain(content_feat, style_feat, eps=1e-5): c_mean = content_feat.mean(dim=[2, 3], keepdim=True) c_std = content_feat.std(dim=[2, 3], keepdim=True) + eps s_mean = style_feat.mean(dim=[2, 3], keepdim=True) s_std = style_feat.std(dim=[2, 3], keepdim=True) + eps normalized = (content_feat - c_mean) / c_std return normalized * s_std + s_mean ``` ### Gatys optimization (full) ```python import torch.optim as optim from torchvision.models import vgg19 vgg = vgg19(pretrained=True).features.eval().cuda() target = content_img.clone().requires_grad_(True) optimizer = optim.LBFGS([target]) def closure(): optimizer.zero_grad() feats = extract_features(target, vgg) c_loss = F.mse_loss(feats['content'], content_feats['content']) s_loss = sum(F.mse_loss(gram_matrix(feats[l]), gram_matrix(style_feats[l])) for l in style_layers) loss = c_loss + 1e6 * s_loss loss.backward() return loss for _ in range(300): optimizer.step(closure) ``` ### IP-Adapter (diffusion-based, 2024+) ```python from diffusers import StableDiffusionPipeline from ip_adapter import IPAdapter pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5").to("cuda") ip_model = IPAdapter(pipe, "h94/IP-Adapter", "models/ip-adapter_sd15.bin", "cuda") style_ref = Image.open("vangogh.jpg") images = ip_model.generate(pil_image=style_ref, prompt="a cat", num_samples=4, scale=0.7) ``` ### Midjourney --sref (2026 mainstream) ``` /imagine prompt: a serene lake at sunset --sref https://example.com/style.jpg --sw 100 ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 일회성 art experiment | Gatys (구현 simple, slow ok) | | 매 real-time / video | AdaIN, fast NST | | 매 production creative | IP-Adapter + SDXL / FLUX | | 매 non-coder creative | Midjourney --sref | | 매 controllable structure + style | ControlNet + IP-Adapter combo | **기본값**: 매 2026 의 IP-Adapter (open) 또는 Midjourney --sref (closed). ## 🔗 Graph - 부모: [[Generative-AI]] · [[Computer Vision|Computer-Vision]] - 변형: [[Style_Reference_(--sref)]] · [[ControlNet]] · [[IP-Adapter]] - 응용: [[AI 이미지 생성 (AI Image Generation)|Image-Generation]] - Adjacent: [[Diffusion-Models]] ## 🤖 LLM 활용 **언제**: 매 creative pipeline 의 style consistency, brand asset variant 생성, mood board 의 visual exploration. **언제 X**: 매 photo retouching (use Lightroom), 매 strict color grading (use LUTs), 매 face identity preservation 의 unstable. ## ❌ 안티패턴 - **Style weight 무한 증가**: 매 content 가 사라짐. balance 필수 (1e6 typical). - **Single VGG layer**: 매 multi-scale style 의 lost. 매 multiple layer aggregate. - **Diffusion 의 prompt 무시**: IP-Adapter scale 너무 높으면 prompt 의 무시. scale 0.5-0.8 sweet spot. ## 🧪 검증 / 중복 - Verified (Gatys et al. 2015 "A Neural Algorithm of Artistic Style"; Huang & Belongie 2017 AdaIN; Ye et al. 2023 IP-Adapter). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Gatys → AdaIN → diffusion (IP-Adapter, --sref) coverage |