--- id: wiki-2026-0508-오픈소스-이미지-모델-미세-조정-및-배포 title: 오픈소스 이미지 모델 미세 조정 및 배포 category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Open-Source Image Model Fine-tuning, OSS Image Model Deploy, FLUX/SDXL Fine-tune] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [ai, fine-tuning, lora, flux, sdxl, deployment] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: diffusers-peft-vllm --- # 오픈소스 이미지 모델 미세 조정 및 배포 ## 매 한 줄 > **"매 LoRA 30분 + ComfyUI/vLLM 배포 = production image gen"**. 2026 의 OSS image stack (FLUX.1, SDXL, SD3.5) 은 LoRA/DoRA 기반 PEFT, FP8/INT4 양자화, ComfyUI/Automatic1111/Replicate 배포를 통해 single GPU 에서 가동 가능하다. 매 closed API 와 비교해 cost 1/10, full control 의 장점이 압도적. ## 매 핵심 ### 매 OSS model 선택 (2026) - **FLUX.1-dev**: 12B params, photorealism + prompt fidelity 최강. - **FLUX.1-schnell**: 4-step 빠른 inference, Apache 2.0. - **SDXL 1.0 + Turbo**: ecosystem 풍부, LoRA 호환성 최고. - **SD 3.5 Large**: MMDiT, 8B params. - **AuraFlow / PixArt-Sigma**: 경량 alternative. ### 매 fine-tune 방법 - **LoRA**: rank 16-64, target attention. - **DoRA**: weight-decomposed LoRA, 더 안정적. - **Full fine-tune**: rare, 80GB+ VRAM. - **Textual Inversion**: token embedding only. - **Dreambooth**: subject-driven, 매 LoRA 와 결합. ### 매 응용 1. Brand asset gen 의 in-house pipeline. 2. Character consistency 의 LoRA library. 3. On-device gen 의 quantized deploy. ## 💻 패턴 ### LoRA 학습 (FLUX, ai-toolkit) ```yaml # config/flux_lora.yaml job: extension config: name: hero_character_v1 process: - type: sd_trainer training_folder: ./output device: cuda:0 network: type: lora linear: 32 # rank linear_alpha: 32 train: batch_size: 1 steps: 2000 gradient_accumulation_steps: 4 train_unet: true train_text_encoder: false lr: 1e-4 optimizer: adamw8bit datasets: - folder_path: ./data/hero caption_ext: txt resolution: [512, 768, 1024] model: name_or_path: black-forest-labs/FLUX.1-dev is_flux: true quantize: true # 8-bit base for fit ``` ### 학습 실행 ```bash git clone https://github.com/ostris/ai-toolkit && cd ai-toolkit pip install -r requirements.txt python run.py config/flux_lora.yaml # 30 min on RTX 4090, output: hero_character_v1.safetensors ``` ### Quantization (FP8 / INT4) ```python from optimum.quanto import quantize, qfloat8, freeze from diffusers import FluxPipeline import torch pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16) quantize(pipe.transformer, weights=qfloat8) freeze(pipe.transformer) quantize(pipe.text_encoder_2, weights=qfloat8) freeze(pipe.text_encoder_2) pipe.to("cuda") # VRAM: 24GB → 14GB ``` ### Inference server (FastAPI + diffusers) ```python from fastapi import FastAPI from pydantic import BaseModel from diffusers import FluxPipeline import torch, io, base64 app = FastAPI() pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda") pipe.load_lora_weights("./loras/hero.safetensors") class Req(BaseModel): prompt: str steps: int = 28 guidance: float = 3.5 seed: int | None = None @app.post("/generate") def generate(r: Req): gen = torch.Generator("cuda").manual_seed(r.seed) if r.seed else None img = pipe(r.prompt, num_inference_steps=r.steps, guidance_scale=r.guidance, generator=gen).images[0] buf = io.BytesIO(); img.save(buf, format="PNG") return {"image_b64": base64.b64encode(buf.getvalue()).decode()} ``` ### ComfyUI workflow (JSON node graph) ```json { "1": {"class_type":"CheckpointLoaderSimple", "inputs":{"ckpt_name":"flux1-dev.safetensors"}}, "2": {"class_type":"LoraLoader", "inputs":{"model":["1",0],"clip":["1",1], "lora_name":"hero.safetensors", "strength_model":0.9,"strength_clip":0.9}}, "3": {"class_type":"CLIPTextEncode", "inputs":{"clip":["2",1],"text":" in a forest, cinematic"}}, "4": {"class_type":"KSampler", "inputs":{"model":["2",0],"steps":28,"cfg":3.5, "sampler_name":"euler","scheduler":"simple"}} } ``` ### Replicate / Modal deploy ```python # Modal import modal app = modal.App("flux-lora") image = modal.Image.debian_slim().pip_install("diffusers","torch","accelerate","peft") @app.cls(gpu="A100-40GB", image=image) class Generator: @modal.enter() def load(self): from diffusers import FluxPipeline self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev").to("cuda") self.pipe.load_lora_weights("./hero.safetensors") @modal.method() def generate(self, prompt: str): return self.pipe(prompt).images[0] ``` ### Caption automation (BLIP/Florence-2) ```python from transformers import AutoProcessor, AutoModelForCausalLM proc = AutoProcessor.from_pretrained("microsoft/Florence-2-large") model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large").cuda() def caption(img): inputs = proc(text="", images=img, return_tensors="pt").to("cuda") out = model.generate(**inputs, max_new_tokens=256, num_beams=3) return proc.batch_decode(out, skip_special_tokens=True)[0] # Batch caption training data for f in dataset_folder.glob("*.png"): Path(f.with_suffix(".txt")).write_text(" " + caption(open_image(f))) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | photoreal + prompt fidelity | FLUX.1-dev | | 빠른 prototyping | FLUX.1-schnell | | custom subject | LoRA + Dreambooth | | 24GB GPU 미만 | FP8 quantization | | no-code workflow | ComfyUI | | serverless scale | Modal / Replicate | | commercial use | Apache/SAI license 확인 | **기본값**: FLUX.1-dev + LoRA rank 32 + ComfyUI for prototyping, Modal for prod. ## 🔗 Graph - 부모: [[AI Image Generation]] · [[Model Fine-tuning]] - 변형: [[LoRA Fine-tuning]] · [[Dreambooth]] - 응용: [[인공지능 시각 언어 생성 (AI Visual Language Generation)]] · [[일관된 캐릭터 및 스타일 구축]] - Adjacent: [[Quantization]] · [[ComfyUI]] · [[Diffusers Library]] ## 🤖 LLM 활용 **언제**: training caption authoring, hyperparameter sweep planning, pipeline debugging. **언제 X**: visual aesthetic judgment — human eval 필요. ## ❌ 안티패턴 - **No caption strategy**: same caption 매 image — model 이 trigger token ignore. - **Rank too high**: rank 256 → overfit + huge file. - **Skipping validation set**: train loss only, no FID/CLIP score. - **License blindness**: commercial use restriction 의 무시. ## 🧪 검증 / 중복 - Verified (BFL FLUX docs 2025, ai-toolkit repo, diffusers 0.32+, Modal docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — OSS image model fine-tune + deploy stack. |