"매 frozen base + tiny trainable delta". 매 full fine-tuning 의 ~0.1-1% parameter 만 학습. 2026 standard: LoRA / QLoRA — 매 70B model 도 single 24GB GPU 에서 fine-tune 가능. HuggingFace peft library 의 사실상 표준.
매 핵심
매 동기
Full FT: 70B model = 280GB (fp32) gradient + optimizer state → multi-A100 cluster 필요.
Storage: 매 task 마다 full checkpoint 저장 시 비용 폭발.
Catastrophic forgetting: 매 full FT 가 base capability 손상.
PEFT: 매 base frozen, delta 만 학습 → 1 base + N tiny adapters.
fromtransformersimportBitsAndBytesConfigimporttorchbnb=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_use_double_quant=True,)base=AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-70B",quantization_config=bnb,device_map="auto",)model=get_peft_model(base,lora_config)# 70B on single 48GB GPU
Save / load adapter only
model.save_pretrained("./my-lora")# ~50MB, not 140GBfrompeftimportPeftModelloaded=PeftModel.from_pretrained(base,"./my-lora")
Merge for inference
merged=model.merge_and_unload()# W ← W + αBA/rmerged.save_pretrained("./merged-model")# standard HF model, no peft dep
언제: 매 single GPU 에서 large model fine-tune, multi-tenant LoRA serving, rapid task iteration.
언제 X: 매 base model 의 fundamental capability 변경 필요 (continued pretraining → full FT or full pretraining).
❌ 안티패턴
Rank too low: r=1-2 → underfitting. 매 r=8-32 starting point.
Wrong target modules: only q_proj/v_proj skip → degraded. 매 all attention + MLP modules 가 best.
Forgetting alpha: 매 alpha=2r convention 무시 → unstable training.
Saving full model: model.save_pretrained() on PeftModel 만 saves adapter. Don't merge unnecessarily.
QLoRA + bf16 base: 매 NF4 quantization 의 redundant. 매 fp16 or bf16 base 둘 중 하나.
🧪 검증 / 중복
Verified (HuggingFace peft docs, Hu et al. 2021 LoRA, Dettmers et al. 2023 QLoRA).