Files
2nd/10_Wiki/Topics/AI_and_ML/Reranking.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

5.8 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-reranking Reranking 10_Wiki/Topics verified self
Cross-Encoder-Reranking
Re-Ranker
RAG-Reranking
none A 0.9 applied
rag
retrieval
reranking
search
2026-05-10 pending
language framework
python sentence-transformers

Reranking

매 한 줄

"매 retrieval은 recall, 매 rerank는 precision". Reranking은 매 first-stage retrieval (BM25/dense) 에서 매 top-k candidates를 매 expensive cross-encoder/LLM으로 매 re-score — RAG quality 의 매 single biggest lever in 2026 (Cohere Rerank 4, BGE-Reranker-v2.5, Voyage rerank-3).

매 핵심

매 왜 필요

  • Bi-encoder (dense retrieval): query, doc를 매 separately encode → cosine. Fast (cached doc embeddings) but 매 shallow interaction.
  • Cross-encoder: [query, doc] 의 매 jointly encode → scalar score. 매 deep token-level attention → +1030% NDCG.
  • Trade-off: O(N) cross-encoder 의 매 too slow → first-stage retrieve top-100, rerank to top-5.

매 Architectures

  • Cross-encoder (BERT-based): [CLS] q [SEP] d [SEP] → linear → score. BGE-Reranker-v2.5, Cohere Rerank 4, Voyage rerank-3.
  • ColBERT / late interaction: doc의 매 token-level embeddings 매 미리 계산 → query token이 매 max-sim로 score. Cross-encoder의 매 ~80% quality at retrieval-speed.
  • LLM-as-reranker: prompt 의 GPT-5/Claude 매 listwise rank. RankGPT, RankZephyr 매 paradigm — 매 quality 최고지만 매 가장 비쌈.
  • RRF (Reciprocal Rank Fusion): cheap fusion of multiple rankers — score(d) = Σ 1/(k+rank_i(d)).

매 Hybrid Search Stack (2026 standard)

  1. BM25 (sparse) + Dense (e.g., BGE-M3) → parallel.
  2. RRF fuse → top-100.
  3. Cross-encoder rerank → top-10.
  4. (Optional) LLM rerank → top-3 for high-stakes.

매 응용

  1. RAG 의 매 답변 정확도 ↑.
  2. E-commerce search relevance.
  3. Legal/medical document discovery (precision-critical).
  4. Code search (semantic + lexical hybrid).

💻 패턴

Cross-encoder rerank (sentence-transformers)

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("BAAI/bge-reranker-v2.5-gemma2-lightweight")

def rerank(query: str, candidates: list[str], top_k: int = 5):
    pairs = [[query, doc] for doc in candidates]
    scores = reranker.predict(pairs)  # numpy array
    ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
    return ranked[:top_k]

Cohere Rerank API

import cohere
co = cohere.Client()

def cohere_rerank(query: str, docs: list[str], top_n: int = 5):
    resp = co.rerank(
        model="rerank-v4.0",
        query=query, documents=docs, top_n=top_n,
    )
    return [(docs[r.index], r.relevance_score) for r in resp.results]

Reciprocal Rank Fusion

def rrf(rankings: list[list[str]], k: int = 60) -> list[str]:
    """rankings: list of ranked doc-id lists from different retrievers."""
    scores: dict[str, float] = {}
    for ranking in rankings:
        for rank, doc_id in enumerate(ranking):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(scores, key=scores.get, reverse=True)

Hybrid retrieve + rerank pipeline

def hybrid_rag(query: str, k_first=100, k_final=5):
    bm25_hits = bm25.search(query, top_k=k_first)
    dense_hits = dense_index.search(query, top_k=k_first)
    fused = rrf([bm25_hits, dense_hits])[:k_first]
    docs = [load_doc(d) for d in fused]
    return rerank(query, docs, top_k=k_final)

LLM-as-reranker (listwise)

def llm_rerank(query: str, docs: list[str]) -> list[int]:
    numbered = "\n".join(f"[{i}] {d[:300]}" for i, d in enumerate(docs))
    resp = client.messages.create(
        model="claude-opus-4-7", max_tokens=200,
        messages=[{"role": "user", "content":
            f"Query: {query}\nDocs:\n{numbered}\nReturn comma-separated indices best→worst."}],
    ).content[0].text
    return [int(x) for x in resp.strip().split(",")]

ColBERT late-interaction (RAGatouille)

from ragatouille import RAGPretrainedModel

rag = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.5")
rag.index(collection=docs, index_name="my-index")
results = rag.search(query="foo", k=10)

매 결정 기준

상황 Approach
Cost-sensitive RAG BM25 + dense → RRF (no rerank)
Quality > latency Hybrid + cross-encoder rerank
Highest quality + LLM rerank top-20 → top-3
거대 corpus (>10M docs) ColBERT for second stage
Multilingual BGE-Reranker-v2.5 / Cohere rerank-v4

기본값: BM25 + BGE-M3 dense → RRF top-100 → BGE-Reranker-v2.5 top-5.

🔗 Graph

🤖 LLM 활용

언제: high-stakes RAG (legal/medical/finance), small candidate set, listwise. 언제 X: 매 latency budget < 100ms, 매 large k (cost), 매 simple FAQ chat (overkill).

안티패턴

  • Rerank without first-stage filter: O(N) on full corpus → cost explosion.
  • Cross-encoder for indexing: 매 doc embeddings 의 매 cache 의 X — 매 query마다 recompute.
  • Pointwise LLM rerank: 매 doc 별 separate call → listwise보다 매 비싸고 inconsistent.
  • Ignoring score calibration: cross-encoder score는 매 not probability — threshold 매 dataset-specific tuning 필요.

🧪 검증 / 중복

  • Verified (Cohere docs, BGE paper, ColBERT v2.5, RankGPT/RankZephyr).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full rewrite as canonical for cross-encoder/ColBERT/RRF/LLM rerank