Files
2nd/10_Wiki/Topics/AI_and_ML/Reranking.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

157 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-reranking
title: Reranking
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Cross-Encoder-Reranking, Re-Ranker, RAG-Reranking]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [rag, retrieval, reranking, search]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: sentence-transformers
---
# Reranking
## 매 한 줄
> **"매 retrieval은 recall, 매 rerank는 precision"**. Reranking은 매 first-stage retrieval (BM25/dense) 에서 매 top-k candidates를 매 expensive cross-encoder/LLM으로 매 re-score — RAG quality 의 매 single biggest lever in 2026 (Cohere Rerank 4, BGE-Reranker-v2.5, Voyage rerank-3).
## 매 핵심
### 매 왜 필요
- Bi-encoder (dense retrieval): query, doc를 매 separately encode → cosine. Fast (cached doc embeddings) but 매 shallow interaction.
- Cross-encoder: `[query, doc]` 의 매 jointly encode → scalar score. 매 deep token-level attention → +1030% NDCG.
- Trade-off: O(N) cross-encoder 의 매 too slow → first-stage retrieve top-100, rerank to top-5.
### 매 Architectures
- **Cross-encoder** (BERT-based): `[CLS] q [SEP] d [SEP]` → linear → score. BGE-Reranker-v2.5, Cohere Rerank 4, Voyage rerank-3.
- **ColBERT / late interaction**: doc의 매 token-level embeddings 매 미리 계산 → query token이 매 max-sim로 score. Cross-encoder의 매 ~80% quality at retrieval-speed.
- **LLM-as-reranker**: prompt 의 GPT-5/Claude 매 listwise rank. RankGPT, RankZephyr 매 paradigm — 매 quality 최고지만 매 가장 비쌈.
- **RRF (Reciprocal Rank Fusion)**: cheap fusion of multiple rankers — `score(d) = Σ 1/(k+rank_i(d))`.
### 매 Hybrid Search Stack (2026 standard)
1. BM25 (sparse) + Dense (e.g., BGE-M3) → parallel.
2. RRF fuse → top-100.
3. Cross-encoder rerank → top-10.
4. (Optional) LLM rerank → top-3 for high-stakes.
### 매 응용
1. RAG 의 매 답변 정확도 ↑.
2. E-commerce search relevance.
3. Legal/medical document discovery (precision-critical).
4. Code search (semantic + lexical hybrid).
## 💻 패턴
### Cross-encoder rerank (sentence-transformers)
```python
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("BAAI/bge-reranker-v2.5-gemma2-lightweight")
def rerank(query: str, candidates: list[str], top_k: int = 5):
pairs = [[query, doc] for doc in candidates]
scores = reranker.predict(pairs) # numpy array
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
return ranked[:top_k]
```
### Cohere Rerank API
```python
import cohere
co = cohere.Client()
def cohere_rerank(query: str, docs: list[str], top_n: int = 5):
resp = co.rerank(
model="rerank-v4.0",
query=query, documents=docs, top_n=top_n,
)
return [(docs[r.index], r.relevance_score) for r in resp.results]
```
### Reciprocal Rank Fusion
```python
def rrf(rankings: list[list[str]], k: int = 60) -> list[str]:
"""rankings: list of ranked doc-id lists from different retrievers."""
scores: dict[str, float] = {}
for ranking in rankings:
for rank, doc_id in enumerate(ranking):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
return sorted(scores, key=scores.get, reverse=True)
```
### Hybrid retrieve + rerank pipeline
```python
def hybrid_rag(query: str, k_first=100, k_final=5):
bm25_hits = bm25.search(query, top_k=k_first)
dense_hits = dense_index.search(query, top_k=k_first)
fused = rrf([bm25_hits, dense_hits])[:k_first]
docs = [load_doc(d) for d in fused]
return rerank(query, docs, top_k=k_final)
```
### LLM-as-reranker (listwise)
```python
def llm_rerank(query: str, docs: list[str]) -> list[int]:
numbered = "\n".join(f"[{i}] {d[:300]}" for i, d in enumerate(docs))
resp = client.messages.create(
model="claude-opus-4-7", max_tokens=200,
messages=[{"role": "user", "content":
f"Query: {query}\nDocs:\n{numbered}\nReturn comma-separated indices best→worst."}],
).content[0].text
return [int(x) for x in resp.strip().split(",")]
```
### ColBERT late-interaction (RAGatouille)
```python
from ragatouille import RAGPretrainedModel
rag = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.5")
rag.index(collection=docs, index_name="my-index")
results = rag.search(query="foo", k=10)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Cost-sensitive RAG | BM25 + dense → RRF (no rerank) |
| Quality > latency | Hybrid + cross-encoder rerank |
| Highest quality | + LLM rerank top-20 → top-3 |
| 거대 corpus (>10M docs) | ColBERT for second stage |
| Multilingual | BGE-Reranker-v2.5 / Cohere rerank-v4 |
**기본값**: BM25 + BGE-M3 dense → RRF top-100 → BGE-Reranker-v2.5 top-5.
## 🔗 Graph
- 부모: [[Information-Retrieval]] · [[RAG]]
- 변형: [[ColBERT]] · [[RRF]]
- 응용: [[Semantic Search|Semantic-Search]] · [[Hybrid-Search]]
- Adjacent: [[BM25]] · [[Dense-Retrieval]] · [[Embeddings]]
## 🤖 LLM 활용
**언제**: high-stakes RAG (legal/medical/finance), small candidate set, listwise.
**언제 X**: 매 latency budget < 100ms, 매 large k (cost), 매 simple FAQ chat (overkill).
## ❌ 안티패턴
- **Rerank without first-stage filter**: O(N) on full corpus → cost explosion.
- **Cross-encoder for indexing**: 매 doc embeddings 의 매 cache 의 X — 매 query마다 recompute.
- **Pointwise LLM rerank**: 매 doc 별 separate call → listwise보다 매 비싸고 inconsistent.
- **Ignoring score calibration**: cross-encoder score는 매 not probability — threshold 매 dataset-specific tuning 필요.
## 🧪 검증 / 중복
- Verified (Cohere docs, BGE paper, ColBERT v2.5, RankGPT/RankZephyr).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full rewrite as canonical for cross-encoder/ColBERT/RRF/LLM rerank |