f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
223 lines
7.2 KiB
Markdown
223 lines
7.2 KiB
Markdown
---
|
|
id: wiki-2026-0508-오픈소스-이미지-모델-미세-조정-및-배포
|
|
title: 오픈소스 이미지 모델 미세 조정 및 배포
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Open-Source Image Model Fine-tuning, OSS Image Model Deploy, FLUX/SDXL Fine-tune]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [ai, fine-tuning, lora, flux, sdxl, deployment]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: diffusers-peft-vllm
|
|
---
|
|
|
|
# 오픈소스 이미지 모델 미세 조정 및 배포
|
|
|
|
## 매 한 줄
|
|
> **"매 LoRA 30분 + ComfyUI/vLLM 배포 = production image gen"**. 2026 의 OSS image stack (FLUX.1, SDXL, SD3.5) 은 LoRA/DoRA 기반 PEFT, FP8/INT4 양자화, ComfyUI/Automatic1111/Replicate 배포를 통해 single GPU 에서 가동 가능하다. 매 closed API 와 비교해 cost 1/10, full control 의 장점이 압도적.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 OSS model 선택 (2026)
|
|
- **FLUX.1-dev**: 12B params, photorealism + prompt fidelity 최강.
|
|
- **FLUX.1-schnell**: 4-step 빠른 inference, Apache 2.0.
|
|
- **SDXL 1.0 + Turbo**: ecosystem 풍부, LoRA 호환성 최고.
|
|
- **SD 3.5 Large**: MMDiT, 8B params.
|
|
- **AuraFlow / PixArt-Sigma**: 경량 alternative.
|
|
|
|
### 매 fine-tune 방법
|
|
- **LoRA**: rank 16-64, target attention.
|
|
- **DoRA**: weight-decomposed LoRA, 더 안정적.
|
|
- **Full fine-tune**: rare, 80GB+ VRAM.
|
|
- **Textual Inversion**: token embedding only.
|
|
- **Dreambooth**: subject-driven, 매 LoRA 와 결합.
|
|
|
|
### 매 응용
|
|
1. Brand asset gen 의 in-house pipeline.
|
|
2. Character consistency 의 LoRA library.
|
|
3. On-device gen 의 quantized deploy.
|
|
|
|
## 💻 패턴
|
|
|
|
### LoRA 학습 (FLUX, ai-toolkit)
|
|
```yaml
|
|
# config/flux_lora.yaml
|
|
job: extension
|
|
config:
|
|
name: hero_character_v1
|
|
process:
|
|
- type: sd_trainer
|
|
training_folder: ./output
|
|
device: cuda:0
|
|
network:
|
|
type: lora
|
|
linear: 32 # rank
|
|
linear_alpha: 32
|
|
train:
|
|
batch_size: 1
|
|
steps: 2000
|
|
gradient_accumulation_steps: 4
|
|
train_unet: true
|
|
train_text_encoder: false
|
|
lr: 1e-4
|
|
optimizer: adamw8bit
|
|
datasets:
|
|
- folder_path: ./data/hero
|
|
caption_ext: txt
|
|
resolution: [512, 768, 1024]
|
|
model:
|
|
name_or_path: black-forest-labs/FLUX.1-dev
|
|
is_flux: true
|
|
quantize: true # 8-bit base for fit
|
|
```
|
|
|
|
### 학습 실행
|
|
```bash
|
|
git clone https://github.com/ostris/ai-toolkit && cd ai-toolkit
|
|
pip install -r requirements.txt
|
|
python run.py config/flux_lora.yaml
|
|
# 30 min on RTX 4090, output: hero_character_v1.safetensors
|
|
```
|
|
|
|
### Quantization (FP8 / INT4)
|
|
```python
|
|
from optimum.quanto import quantize, qfloat8, freeze
|
|
from diffusers import FluxPipeline
|
|
import torch
|
|
|
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
|
|
torch_dtype=torch.bfloat16)
|
|
quantize(pipe.transformer, weights=qfloat8)
|
|
freeze(pipe.transformer)
|
|
quantize(pipe.text_encoder_2, weights=qfloat8)
|
|
freeze(pipe.text_encoder_2)
|
|
pipe.to("cuda")
|
|
# VRAM: 24GB → 14GB
|
|
```
|
|
|
|
### Inference server (FastAPI + diffusers)
|
|
```python
|
|
from fastapi import FastAPI
|
|
from pydantic import BaseModel
|
|
from diffusers import FluxPipeline
|
|
import torch, io, base64
|
|
|
|
app = FastAPI()
|
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
|
|
torch_dtype=torch.bfloat16).to("cuda")
|
|
pipe.load_lora_weights("./loras/hero.safetensors")
|
|
|
|
class Req(BaseModel):
|
|
prompt: str
|
|
steps: int = 28
|
|
guidance: float = 3.5
|
|
seed: int | None = None
|
|
|
|
@app.post("/generate")
|
|
def generate(r: Req):
|
|
gen = torch.Generator("cuda").manual_seed(r.seed) if r.seed else None
|
|
img = pipe(r.prompt, num_inference_steps=r.steps,
|
|
guidance_scale=r.guidance, generator=gen).images[0]
|
|
buf = io.BytesIO(); img.save(buf, format="PNG")
|
|
return {"image_b64": base64.b64encode(buf.getvalue()).decode()}
|
|
```
|
|
|
|
### ComfyUI workflow (JSON node graph)
|
|
```json
|
|
{
|
|
"1": {"class_type":"CheckpointLoaderSimple",
|
|
"inputs":{"ckpt_name":"flux1-dev.safetensors"}},
|
|
"2": {"class_type":"LoraLoader",
|
|
"inputs":{"model":["1",0],"clip":["1",1],
|
|
"lora_name":"hero.safetensors",
|
|
"strength_model":0.9,"strength_clip":0.9}},
|
|
"3": {"class_type":"CLIPTextEncode",
|
|
"inputs":{"clip":["2",1],"text":"<hero> in a forest, cinematic"}},
|
|
"4": {"class_type":"KSampler",
|
|
"inputs":{"model":["2",0],"steps":28,"cfg":3.5,
|
|
"sampler_name":"euler","scheduler":"simple"}}
|
|
}
|
|
```
|
|
|
|
### Replicate / Modal deploy
|
|
```python
|
|
# Modal
|
|
import modal
|
|
app = modal.App("flux-lora")
|
|
image = modal.Image.debian_slim().pip_install("diffusers","torch","accelerate","peft")
|
|
|
|
@app.cls(gpu="A100-40GB", image=image)
|
|
class Generator:
|
|
@modal.enter()
|
|
def load(self):
|
|
from diffusers import FluxPipeline
|
|
self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev").to("cuda")
|
|
self.pipe.load_lora_weights("./hero.safetensors")
|
|
|
|
@modal.method()
|
|
def generate(self, prompt: str):
|
|
return self.pipe(prompt).images[0]
|
|
```
|
|
|
|
### Caption automation (BLIP/Florence-2)
|
|
```python
|
|
from transformers import AutoProcessor, AutoModelForCausalLM
|
|
proc = AutoProcessor.from_pretrained("microsoft/Florence-2-large")
|
|
model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large").cuda()
|
|
|
|
def caption(img):
|
|
inputs = proc(text="<MORE_DETAILED_CAPTION>", images=img, return_tensors="pt").to("cuda")
|
|
out = model.generate(**inputs, max_new_tokens=256, num_beams=3)
|
|
return proc.batch_decode(out, skip_special_tokens=True)[0]
|
|
|
|
# Batch caption training data
|
|
for f in dataset_folder.glob("*.png"):
|
|
Path(f.with_suffix(".txt")).write_text("<myStyle> " + caption(open_image(f)))
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| photoreal + prompt fidelity | FLUX.1-dev |
|
|
| 빠른 prototyping | FLUX.1-schnell |
|
|
| custom subject | LoRA + Dreambooth |
|
|
| 24GB GPU 미만 | FP8 quantization |
|
|
| no-code workflow | ComfyUI |
|
|
| serverless scale | Modal / Replicate |
|
|
| commercial use | Apache/SAI license 확인 |
|
|
|
|
**기본값**: FLUX.1-dev + LoRA rank 32 + ComfyUI for prototyping, Modal for prod.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[AI 이미지 생성 (AI Image Generation)]]
|
|
- 변형: [[LoRA Fine-tuning]]
|
|
- 응용: [[인공지능 시각 언어 생성 (AI Visual Language Generation)]] · [[일관된 캐릭터 및 스타일 구축]]
|
|
- Adjacent: [[LLM_Optimization_and_Deployment_Strategies|Quantization]] · [[ComfyUI]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: training caption authoring, hyperparameter sweep planning, pipeline debugging.
|
|
**언제 X**: visual aesthetic judgment — human eval 필요.
|
|
|
|
## ❌ 안티패턴
|
|
- **No caption strategy**: same caption 매 image — model 이 trigger token ignore.
|
|
- **Rank too high**: rank 256 → overfit + huge file.
|
|
- **Skipping validation set**: train loss only, no FID/CLIP score.
|
|
- **License blindness**: commercial use restriction 의 무시.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (BFL FLUX docs 2025, ai-toolkit repo, diffusers 0.32+, Modal docs).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — OSS image model fine-tune + deploy stack. |
|