f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
175 lines
6.1 KiB
Markdown
175 lines
6.1 KiB
Markdown
---
|
|
id: wiki-2026-0508-인공지능-시각-언어-생성-ai-visual-language
|
|
title: 인공지능 시각 언어 생성 (AI Visual Language Generation)
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [AI Visual Language, Visual Style Generation, 시각 언어 생성]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [ai, image-generation, visual-language, style, branding]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: diffusers-flux-sdxl
|
|
---
|
|
|
|
# 인공지능 시각 언어 생성 (AI Visual Language Generation)
|
|
|
|
## 매 한 줄
|
|
> **"매 visual language 는 단순 style 이 아닌 systematic grammar"**. 2026 의 AI image gen 은 단발성 prompt 의 phase 를 지나, brand-grade visual grammar (color, composition, motif, lighting) 를 학습된 LoRA stack + style transfer + control net 으로 generate 하는 단계로 진입했다. FLUX, SDXL, Imagen 4 가 production-grade visual identity 의 backbone 이 됨.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 visual language 의 component
|
|
- **Color palette**: oklch tokens, dominant/accent ratio.
|
|
- **Composition rules**: rule of thirds, negative space, symmetry/asymmetry.
|
|
- **Motif vocabulary**: recurring shape, icon, texture.
|
|
- **Lighting model**: rim/key/fill, time of day, mood.
|
|
- **Material/finish**: matte/glossy, organic/synthetic.
|
|
|
|
### 매 generation stack (2026)
|
|
- **Base model**: FLUX.1-dev / SDXL / Imagen 4.
|
|
- **Style LoRA**: 30-100 ref images 로 finetune.
|
|
- **Subject LoRA**: character/object identity.
|
|
- **ControlNet**: pose, depth, edge, normal.
|
|
- **IP-Adapter**: reference image guidance.
|
|
- **Regional prompting**: per-region distinct style.
|
|
|
|
### 매 응용
|
|
1. Brand identity 의 marketing asset auto-gen.
|
|
2. Game art direction 의 concept art exploration.
|
|
3. Editorial illustration 의 series consistency.
|
|
|
|
## 💻 패턴
|
|
|
|
### Style LoRA training (FLUX)
|
|
```python
|
|
from diffusers import FluxPipeline
|
|
import torch
|
|
from peft import LoraConfig
|
|
|
|
# 1. Curate 50-100 ref images that share visual language
|
|
# 2. Caption with consistent trigger token
|
|
captions = ["<myStyle> a serene landscape, oil painting feel, ..."]
|
|
|
|
# 3. Train LoRA
|
|
lora_config = LoraConfig(
|
|
r=32, lora_alpha=32,
|
|
target_modules=["to_q","to_k","to_v","to_out.0"],
|
|
)
|
|
# train loop with 1500-3000 steps, lr=1e-4
|
|
```
|
|
|
|
### Multi-LoRA stacking
|
|
```python
|
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
|
|
torch_dtype=torch.bfloat16).to("cuda")
|
|
|
|
# Stack: style + character
|
|
pipe.load_lora_weights("./styles/brand_v3.safetensors", adapter_name="style")
|
|
pipe.load_lora_weights("./chars/hero.safetensors", adapter_name="char")
|
|
pipe.set_adapters(["style","char"], adapter_weights=[0.8, 0.9])
|
|
|
|
img = pipe(
|
|
"<myStyle> <hero> standing on cliff at golden hour",
|
|
num_inference_steps=28, guidance_scale=3.5
|
|
).images[0]
|
|
```
|
|
|
|
### ControlNet + IP-Adapter
|
|
```python
|
|
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
|
|
|
|
controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")
|
|
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
|
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet
|
|
).to("cuda")
|
|
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models",
|
|
weight_name="ip-adapter_sdxl.bin")
|
|
pipe.set_ip_adapter_scale(0.6)
|
|
|
|
img = pipe(
|
|
prompt="cyberpunk skyline, our brand visual language",
|
|
image=depth_map, ip_adapter_image=style_ref,
|
|
num_inference_steps=30
|
|
).images[0]
|
|
```
|
|
|
|
### Regional prompting (mask-based)
|
|
```python
|
|
# StableDiffusion Forge / ComfyUI workflow
|
|
regions = [
|
|
{"mask": top_half_mask, "prompt": "<myStyle> dramatic sky, golden clouds"},
|
|
{"mask": bottom_half_mask, "prompt": "<myStyle> reflective ocean, calm waves"},
|
|
]
|
|
img = regional_pipe(regions, base_prompt="<myStyle> seascape")
|
|
```
|
|
|
|
### Visual grammar validation
|
|
```python
|
|
# CLIP score against reference language vector
|
|
import open_clip
|
|
model, _, preprocess = open_clip.create_model_and_transforms("ViT-bigG-14")
|
|
|
|
ref_lang_vector = mean([model.encode_image(preprocess(r)) for r in ref_images])
|
|
gen_vec = model.encode_image(preprocess(generated))
|
|
similarity = cosine(ref_lang_vector, gen_vec)
|
|
assert similarity > 0.78, "style drift"
|
|
```
|
|
|
|
### Palette enforcement post-process
|
|
```python
|
|
import numpy as np
|
|
from sklearn.cluster import KMeans
|
|
|
|
def quantize_to_palette(img, palette_oklch):
|
|
pixels = img.reshape(-1,3)
|
|
palette_rgb = oklch_to_rgb(palette_oklch)
|
|
# Snap each pixel to nearest palette color
|
|
dists = np.linalg.norm(pixels[:,None,:] - palette_rgb[None,:,:], axis=2)
|
|
nearest = np.argmin(dists, axis=1)
|
|
return palette_rgb[nearest].reshape(img.shape).astype(np.uint8)
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| brand 일관성 priority | LoRA + palette enforce |
|
|
| concept exploration | base model + prompt only |
|
|
| character + style 동시 | Multi-LoRA stacking |
|
|
| 정확한 layout | ControlNet (depth/canny) |
|
|
| 1 ref image only | IP-Adapter |
|
|
| 다른 style/region 의 분리 | Regional prompting |
|
|
|
|
**기본값**: FLUX.1-dev + style LoRA + palette post-process.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[AI 이미지 생성 (AI Image Generation)]]
|
|
- 변형: [[Style Transfer]] · [[LoRA Fine-tuning]]
|
|
- Adjacent: [[ControlNet]] · [[IP-Adapter]] · [[일관된 캐릭터 및 스타일 구축]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: visual brief 의 prompt expansion, LoRA caption batch authoring, palette extraction from refs.
|
|
**언제 X**: photographic accuracy 의 precise composition — manual photo shoot 가 정답.
|
|
|
|
## ❌ 안티패턴
|
|
- **Random prompt soup**: each gen 마다 다른 keyword — no language emergence.
|
|
- **Single-image LoRA**: overfit, mode collapse.
|
|
- **Skipping captions**: trigger token 없으면 LoRA 가 always-on.
|
|
- **Negative prompt 만 의존**: positive 의 vocabulary 정의가 우선.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Black Forest Labs FLUX docs 2025, diffusers library, IP-Adapter paper).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — visual language gen 의 LoRA + ControlNet stack. |
|