"매 model 이 본 적 없는 input 의 거부". OOD detection 은 inference 시 input 이 training distribution 밖인지 판정하여 silent failure 를 막는 safety layer. 매 2026 의 표준은 foundation-model embedding 위 의 KNN / Mahalanobis 또는 logit-energy score, classical ODIN 은 baseline.
매 핵심
매 score family
Softmax baseline (MSP): max softmax probability — weak baseline.
ODIN (Liang 2018): temperature scaling + input gradient perturbation.
Energy (Liu 2020): -T * logsumexp(logits / T), free, strong.
Mahalanobis (Lee 2018): class-conditional Gaussian on penultimate features.
KNN (Sun 2022): k-NN distance in feature space — 매 simple, robust.
DOSE / VIM (2022-2024): residual + logit hybrid.
Foundation-model OOD (CLIP, DINOv2 features + KNN) — 2026 SOTA.
LLM 의 jailbreak / off-distribution prompt detection.
fraud detection 의 novel attack pattern.
💻 패턴
Energy score (Liu 2020)
importtorch,torch.nn.functionalasF@torch.no_grad()defenergy_score(model,x,T=1.0):logits=model(x)# higher energy = OODreturn-T*torch.logsumexp(logits/T,dim=-1)
@torch.no_grad()deffit_mahalanobis(features,labels,num_classes):means=[]forcinrange(num_classes):means.append(features[labels==c].mean(0))means=torch.stack(means)centered=features-means[labels]cov=centered.T@centered/len(features)inv=torch.linalg.pinv(cov)returnmeans,invdefmaha_score(f,means,inv):diffs=f.unsqueeze(1)-means# [N, C, D]d2=torch.einsum("ncd,de,nce->nc",diffs,inv,diffs)returnd2.min(-1).values# smallest distance to any class