Files
2nd/10_Wiki/Topics/AI_and_ML/PEFT (Parameter-Efficient Fine-Tuning).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

169 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-peft-parameter-efficient-fine-tu
title: PEFT (Parameter-Efficient Fine-Tuning)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [PEFT, Parameter-Efficient Fine-Tuning, LoRA fine-tuning]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [peft, lora, qlora, fine-tuning, llm]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: peft, transformers, bitsandbytes
---
# PEFT (Parameter-Efficient Fine-Tuning)
## 매 한 줄
> **"매 frozen base + tiny trainable delta"**. 매 full fine-tuning 의 ~0.1-1% parameter 만 학습. 2026 standard: LoRA / QLoRA — 매 70B model 도 single 24GB GPU 에서 fine-tune 가능. HuggingFace `peft` library 의 사실상 표준.
## 매 핵심
### 매 동기
- Full FT: 70B model = 280GB (fp32) gradient + optimizer state → multi-A100 cluster 필요.
- Storage: 매 task 마다 full checkpoint 저장 시 비용 폭발.
- Catastrophic forgetting: 매 full FT 가 base capability 손상.
- PEFT: 매 base frozen, delta 만 학습 → 1 base + N tiny adapters.
### 매 family
- **LoRA** (Hu et al. 2021): low-rank decomposition `ΔW = BA`, rank r=4-64.
- **QLoRA** (Dettmers et al. 2023): 4-bit NF4 quantized base + LoRA adapters.
- **Prefix Tuning** (Li & Liang 2021): learnable prefix tokens prepended to keys/values.
- **Prompt Tuning** (Lester et al. 2021): learnable soft prompts at input.
- **IA³** (Liu et al. 2022): scale activations via learned vectors (multiply, not add).
- **Adapters** (Houlsby et al. 2019): small bottleneck MLPs inserted between layers.
- **DoRA** (2024): magnitude + direction decomposition, LoRA 보다 우수.
### 매 LoRA 수학
- `W' = W + αBA/r` where `B ∈ R^{d×r}`, `A ∈ R^{r×k}`, r ≪ min(d,k).
- Trainable: `2dr` params instead of `dk`. 매 d=k=4096, r=8 → 65k vs 16M (250× 감소).
- Inference: 매 merge `W ← W + αBA/r` → zero overhead.
### 매 응용
1. Domain adaptation (legal, medical LLM).
2. Instruction tuning (Alpaca-style).
3. Style transfer (FLUX LoRA for art style).
4. Multi-tenant serving (1 base + N customer LoRAs).
## 💻 패턴
### LoRA with peft library
```python
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, lora_alpha=32, lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
model = get_peft_model(base, config)
model.print_trainable_parameters() # ~0.5% trainable
```
### QLoRA (4-bit base + LoRA)
```python
from transformers import BitsAndBytesConfig
import torch
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-70B", quantization_config=bnb, device_map="auto",
)
model = get_peft_model(base, lora_config) # 70B on single 48GB GPU
```
### Save / load adapter only
```python
model.save_pretrained("./my-lora") # ~50MB, not 140GB
from peft import PeftModel
loaded = PeftModel.from_pretrained(base, "./my-lora")
```
### Merge for inference
```python
merged = model.merge_and_unload() # W ← W + αBA/r
merged.save_pretrained("./merged-model") # standard HF model, no peft dep
```
### Multi-LoRA serving (vLLM)
```python
# vLLM 0.6+ supports dynamic LoRA loading
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest
llm = LLM(model="meta-llama/Llama-3.1-8B", enable_lora=True, max_loras=8)
out = llm.generate(prompts, sampling_params,
lora_request=LoRARequest("customer-42", 1, "./customer-42-lora"))
```
### DoRA (2024)
```python
config = LoraConfig(r=16, lora_alpha=32, use_dora=True, # peft >= 0.10
target_modules=["q_proj", "v_proj"])
```
### Prompt tuning
```python
from peft import PromptTuningConfig, PromptTuningInit
config = PromptTuningConfig(
task_type=TaskType.CAUSAL_LM,
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=20,
prompt_tuning_init_text="Classify sentiment:",
tokenizer_name_or_path="meta-llama/Llama-3.1-8B",
)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 1 GPU, large base (70B) | QLoRA |
| Multi-task, single base | LoRA + multi-adapter serving |
| Tiny VRAM, frozen base OK | Prompt tuning |
| Best quality, less compute saving | DoRA |
| Diffusion model style | LoRA (rank 4-32) |
| Production accuracy critical | Full FT (if 가능) |
**기본값**: QLoRA (4-bit NF4 + r=16 LoRA on q/k/v/o projections).
## 🔗 Graph
- 부모: [[Fine-Tuning]]
- 변형: [[LoRA]] · [[QLoRA]] · [[DoRA]]
- 응용: [[Fine-tuning|Instruction-Tuning]] · [[Domain-Adaptation]]
- Adjacent: [[LLM_Optimization_and_Deployment_Strategies|Quantization]] · [[RLHF]]
## 🤖 LLM 활용
**언제**: 매 single GPU 에서 large model fine-tune, multi-tenant LoRA serving, rapid task iteration.
**언제 X**: 매 base model 의 fundamental capability 변경 필요 (continued pretraining → full FT or full pretraining).
## ❌ 안티패턴
- **Rank too low**: r=1-2 → underfitting. 매 r=8-32 starting point.
- **Wrong target modules**: only `q_proj`/`v_proj` skip → degraded. 매 all attention + MLP modules 가 best.
- **Forgetting alpha**: 매 alpha=2r convention 무시 → unstable training.
- **Saving full model**: `model.save_pretrained()` on PeftModel 만 saves adapter. Don't merge unnecessarily.
- **QLoRA + bf16 base**: 매 NF4 quantization 의 redundant. 매 fp16 or bf16 base 둘 중 하나.
## 🧪 검증 / 중복
- Verified (HuggingFace `peft` docs, Hu et al. 2021 LoRA, Dettmers et al. 2023 QLoRA).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — PEFT family, LoRA/QLoRA patterns, decision matrix |