Files
2nd/10_Wiki/Topics/AI_and_ML/오픈소스 이미지 모델 미세 조정 및 배포.md
T
2026-05-10 22:08:15 +09:00

7.2 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-오픈소스-이미지-모델-미세-조정-및-배포 오픈소스 이미지 모델 미세 조정 및 배포 10_Wiki/Topics verified self
Open-Source Image Model Fine-tuning
OSS Image Model Deploy
FLUX/SDXL Fine-tune
none A 0.9 applied
ai
fine-tuning
lora
flux
sdxl
deployment
2026-05-10 pending
language framework
python diffusers-peft-vllm

오픈소스 이미지 모델 미세 조정 및 배포

매 한 줄

"매 LoRA 30분 + ComfyUI/vLLM 배포 = production image gen". 2026 의 OSS image stack (FLUX.1, SDXL, SD3.5) 은 LoRA/DoRA 기반 PEFT, FP8/INT4 양자화, ComfyUI/Automatic1111/Replicate 배포를 통해 single GPU 에서 가동 가능하다. 매 closed API 와 비교해 cost 1/10, full control 의 장점이 압도적.

매 핵심

매 OSS model 선택 (2026)

  • FLUX.1-dev: 12B params, photorealism + prompt fidelity 최강.
  • FLUX.1-schnell: 4-step 빠른 inference, Apache 2.0.
  • SDXL 1.0 + Turbo: ecosystem 풍부, LoRA 호환성 최고.
  • SD 3.5 Large: MMDiT, 8B params.
  • AuraFlow / PixArt-Sigma: 경량 alternative.

매 fine-tune 방법

  • LoRA: rank 16-64, target attention.
  • DoRA: weight-decomposed LoRA, 더 안정적.
  • Full fine-tune: rare, 80GB+ VRAM.
  • Textual Inversion: token embedding only.
  • Dreambooth: subject-driven, 매 LoRA 와 결합.

매 응용

  1. Brand asset gen 의 in-house pipeline.
  2. Character consistency 의 LoRA library.
  3. On-device gen 의 quantized deploy.

💻 패턴

LoRA 학습 (FLUX, ai-toolkit)

# config/flux_lora.yaml
job: extension
config:
  name: hero_character_v1
  process:
    - type: sd_trainer
      training_folder: ./output
      device: cuda:0
      network:
        type: lora
        linear: 32      # rank
        linear_alpha: 32
      train:
        batch_size: 1
        steps: 2000
        gradient_accumulation_steps: 4
        train_unet: true
        train_text_encoder: false
        lr: 1e-4
        optimizer: adamw8bit
      datasets:
        - folder_path: ./data/hero
          caption_ext: txt
          resolution: [512, 768, 1024]
      model:
        name_or_path: black-forest-labs/FLUX.1-dev
        is_flux: true
        quantize: true   # 8-bit base for fit

학습 실행

git clone https://github.com/ostris/ai-toolkit && cd ai-toolkit
pip install -r requirements.txt
python run.py config/flux_lora.yaml
# 30 min on RTX 4090, output: hero_character_v1.safetensors

Quantization (FP8 / INT4)

from optimum.quanto import quantize, qfloat8, freeze
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
                                    torch_dtype=torch.bfloat16)
quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)
quantize(pipe.text_encoder_2, weights=qfloat8)
freeze(pipe.text_encoder_2)
pipe.to("cuda")
# VRAM: 24GB → 14GB

Inference server (FastAPI + diffusers)

from fastapi import FastAPI
from pydantic import BaseModel
from diffusers import FluxPipeline
import torch, io, base64

app = FastAPI()
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
                                    torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("./loras/hero.safetensors")

class Req(BaseModel):
    prompt: str
    steps: int = 28
    guidance: float = 3.5
    seed: int | None = None

@app.post("/generate")
def generate(r: Req):
    gen = torch.Generator("cuda").manual_seed(r.seed) if r.seed else None
    img = pipe(r.prompt, num_inference_steps=r.steps,
               guidance_scale=r.guidance, generator=gen).images[0]
    buf = io.BytesIO(); img.save(buf, format="PNG")
    return {"image_b64": base64.b64encode(buf.getvalue()).decode()}

ComfyUI workflow (JSON node graph)

{
  "1": {"class_type":"CheckpointLoaderSimple",
        "inputs":{"ckpt_name":"flux1-dev.safetensors"}},
  "2": {"class_type":"LoraLoader",
        "inputs":{"model":["1",0],"clip":["1",1],
                  "lora_name":"hero.safetensors",
                  "strength_model":0.9,"strength_clip":0.9}},
  "3": {"class_type":"CLIPTextEncode",
        "inputs":{"clip":["2",1],"text":"<hero> in a forest, cinematic"}},
  "4": {"class_type":"KSampler",
        "inputs":{"model":["2",0],"steps":28,"cfg":3.5,
                  "sampler_name":"euler","scheduler":"simple"}}
}

Replicate / Modal deploy

# Modal
import modal
app = modal.App("flux-lora")
image = modal.Image.debian_slim().pip_install("diffusers","torch","accelerate","peft")

@app.cls(gpu="A100-40GB", image=image)
class Generator:
    @modal.enter()
    def load(self):
        from diffusers import FluxPipeline
        self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev").to("cuda")
        self.pipe.load_lora_weights("./hero.safetensors")

    @modal.method()
    def generate(self, prompt: str):
        return self.pipe(prompt).images[0]

Caption automation (BLIP/Florence-2)

from transformers import AutoProcessor, AutoModelForCausalLM
proc  = AutoProcessor.from_pretrained("microsoft/Florence-2-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large").cuda()

def caption(img):
    inputs = proc(text="<MORE_DETAILED_CAPTION>", images=img, return_tensors="pt").to("cuda")
    out = model.generate(**inputs, max_new_tokens=256, num_beams=3)
    return proc.batch_decode(out, skip_special_tokens=True)[0]

# Batch caption training data
for f in dataset_folder.glob("*.png"):
    Path(f.with_suffix(".txt")).write_text("<myStyle> " + caption(open_image(f)))

매 결정 기준

상황 Approach
photoreal + prompt fidelity FLUX.1-dev
빠른 prototyping FLUX.1-schnell
custom subject LoRA + Dreambooth
24GB GPU 미만 FP8 quantization
no-code workflow ComfyUI
serverless scale Modal / Replicate
commercial use Apache/SAI license 확인

기본값: FLUX.1-dev + LoRA rank 32 + ComfyUI for prototyping, Modal for prod.

🔗 Graph

🤖 LLM 활용

언제: training caption authoring, hyperparameter sweep planning, pipeline debugging. 언제 X: visual aesthetic judgment — human eval 필요.

안티패턴

  • No caption strategy: same caption 매 image — model 이 trigger token ignore.
  • Rank too high: rank 256 → overfit + huge file.
  • Skipping validation set: train loss only, no FID/CLIP score.
  • License blindness: commercial use restriction 의 무시.

🧪 검증 / 중복

  • Verified (BFL FLUX docs 2025, ai-toolkit repo, diffusers 0.32+, Modal docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — OSS image model fine-tune + deploy stack.