Files
2nd/10_Wiki/Topics/AI_and_ML/인공지능 시각 언어 생성 (AI Visual Language Generation).md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

6.0 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-인공지능-시각-언어-생성-ai-visual-language 인공지능 시각 언어 생성 (AI Visual Language Generation) 10_Wiki/Topics verified self
AI Visual Language
Visual Style Generation
시각 언어 생성
none A 0.9 applied
ai
image-generation
visual-language
style
branding
2026-05-10 pending
language framework
python diffusers-flux-sdxl

인공지능 시각 언어 생성 (AI Visual Language Generation)

매 한 줄

"매 visual language 는 단순 style 이 아닌 systematic grammar". 2026 의 AI image gen 은 단발성 prompt 의 phase 를 지나, brand-grade visual grammar (color, composition, motif, lighting) 를 학습된 LoRA stack + style transfer + control net 으로 generate 하는 단계로 진입했다. FLUX, SDXL, Imagen 4 가 production-grade visual identity 의 backbone 이 됨.

매 핵심

매 visual language 의 component

  • Color palette: oklch tokens, dominant/accent ratio.
  • Composition rules: rule of thirds, negative space, symmetry/asymmetry.
  • Motif vocabulary: recurring shape, icon, texture.
  • Lighting model: rim/key/fill, time of day, mood.
  • Material/finish: matte/glossy, organic/synthetic.

매 generation stack (2026)

  • Base model: FLUX.1-dev / SDXL / Imagen 4.
  • Style LoRA: 30-100 ref images 로 finetune.
  • Subject LoRA: character/object identity.
  • ControlNet: pose, depth, edge, normal.
  • IP-Adapter: reference image guidance.
  • Regional prompting: per-region distinct style.

매 응용

  1. Brand identity 의 marketing asset auto-gen.
  2. Game art direction 의 concept art exploration.
  3. Editorial illustration 의 series consistency.

💻 패턴

Style LoRA training (FLUX)

from diffusers import FluxPipeline
import torch
from peft import LoraConfig

# 1. Curate 50-100 ref images that share visual language
# 2. Caption with consistent trigger token
captions = ["<myStyle> a serene landscape, oil painting feel, ..."]

# 3. Train LoRA
lora_config = LoraConfig(
    r=32, lora_alpha=32,
    target_modules=["to_q","to_k","to_v","to_out.0"],
)
# train loop with 1500-3000 steps, lr=1e-4

Multi-LoRA stacking

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
                                    torch_dtype=torch.bfloat16).to("cuda")

# Stack: style + character
pipe.load_lora_weights("./styles/brand_v3.safetensors", adapter_name="style")
pipe.load_lora_weights("./chars/hero.safetensors",     adapter_name="char")
pipe.set_adapters(["style","char"], adapter_weights=[0.8, 0.9])

img = pipe(
    "<myStyle> <hero> standing on cliff at golden hour",
    num_inference_steps=28, guidance_scale=3.5
).images[0]

ControlNet + IP-Adapter

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0")
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet
).to("cuda")
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models",
                     weight_name="ip-adapter_sdxl.bin")
pipe.set_ip_adapter_scale(0.6)

img = pipe(
    prompt="cyberpunk skyline, our brand visual language",
    image=depth_map, ip_adapter_image=style_ref,
    num_inference_steps=30
).images[0]

Regional prompting (mask-based)

# StableDiffusion Forge / ComfyUI workflow
regions = [
  {"mask": top_half_mask,    "prompt": "<myStyle> dramatic sky, golden clouds"},
  {"mask": bottom_half_mask, "prompt": "<myStyle> reflective ocean, calm waves"},
]
img = regional_pipe(regions, base_prompt="<myStyle> seascape")

Visual grammar validation

# CLIP score against reference language vector
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms("ViT-bigG-14")

ref_lang_vector = mean([model.encode_image(preprocess(r)) for r in ref_images])
gen_vec = model.encode_image(preprocess(generated))
similarity = cosine(ref_lang_vector, gen_vec)
assert similarity > 0.78, "style drift"

Palette enforcement post-process

import numpy as np
from sklearn.cluster import KMeans

def quantize_to_palette(img, palette_oklch):
    pixels = img.reshape(-1,3)
    palette_rgb = oklch_to_rgb(palette_oklch)
    # Snap each pixel to nearest palette color
    dists = np.linalg.norm(pixels[:,None,:] - palette_rgb[None,:,:], axis=2)
    nearest = np.argmin(dists, axis=1)
    return palette_rgb[nearest].reshape(img.shape).astype(np.uint8)

매 결정 기준

상황 Approach
brand 일관성 priority LoRA + palette enforce
concept exploration base model + prompt only
character + style 동시 Multi-LoRA stacking
정확한 layout ControlNet (depth/canny)
1 ref image only IP-Adapter
다른 style/region 의 분리 Regional prompting

기본값: FLUX.1-dev + style LoRA + palette post-process.

🔗 Graph

🤖 LLM 활용

언제: visual brief 의 prompt expansion, LoRA caption batch authoring, palette extraction from refs. 언제 X: photographic accuracy 의 precise composition — manual photo shoot 가 정답.

안티패턴

  • Random prompt soup: each gen 마다 다른 keyword — no language emergence.
  • Single-image LoRA: overfit, mode collapse.
  • Skipping captions: trigger token 없으면 LoRA 가 always-on.
  • Negative prompt 만 의존: positive 의 vocabulary 정의가 우선.

🧪 검증 / 중복

  • Verified (Black Forest Labs FLUX docs 2025, diffusers library, IP-Adapter paper).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — visual language gen 의 LoRA + ControlNet stack.