Files
2nd/10_Wiki/Topics/AI_and_ML/Search-Optimization.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.6 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-search-optimization Search Optimization 10_Wiki/Topics verified self
Search Tuning
Retrieval Optimization
Hybrid Search
none A 0.9 applied
search
retrieval
bm25
vector
hybrid
rag
2026-05-10 pending
language framework
Python Elasticsearch + pgvector

Search Optimization

매 한 줄

"매 search 의 quality 는 매 lexical(BM25) + semantic(vector) hybrid + reranker 의 stack — 매 single signal 의 X". 매 origin 은 1970s tf-idf, 1994 BM25 (Robertson); 매 modern state 는 BM25F + dense vector (ColBERT/E5/Cohere v3.5) + cross-encoder rerank, 매 RAG 의 retrieval layer.

매 핵심

매 search stack (매 2026 modern)

  • Lexical: BM25 (Elasticsearch, OpenSearch, Tantivy) — 매 exact term, rare token, code.
  • Dense vector: bi-encoder (E5-large, Cohere embed-v3.5, OpenAI 3-large) — 매 semantic match.
  • Sparse-learned: SPLADE — 매 lexical + learned weight.
  • Hybrid fusion: RRF (Reciprocal Rank Fusion) or weighted score sum.
  • Reranker: cross-encoder (Cohere rerank-3.5, BGE-reranker-v2) — 매 top-50 → top-10.
  • Query understanding: LLM rewrite, HyDE, multi-query expansion.

매 응용

  1. Site search (e-commerce, docs).
  2. RAG retrieval.
  3. Code search (GitHub).
  4. Internal knowledge search.

💻 패턴

매 BM25 (Elasticsearch 9, 매 tuned)

PUT /products
{
  "settings": {
    "similarity": {
      "default": {
        "type": "BM25",
        "k1": 1.2,
        "b": 0.75
      }
    }
  },
  "mappings": {
    "properties": {
      "title":       { "type": "text", "boost": 3.0 },
      "description": { "type": "text" },
      "tags":        { "type": "keyword" },
      "embedding":   { "type": "dense_vector", "dims": 1024, "similarity": "cosine" }
    }
  }
}

매 hybrid query (RRF, ES 9 native)

GET /products/_search
{
  "retriever": {
    "rrf": {
      "retrievers": [
        { "standard": {
            "query": { "multi_match": {
              "query": "wireless earbuds noise cancel",
              "fields": ["title^3", "description"]
            }}
        }},
        { "knn": {
            "field": "embedding",
            "query_vector_builder": {
              "text_embedding": {
                "model_id": "cohere-embed-v3-5",
                "model_text": "wireless earbuds noise cancel"
              }
            },
            "k": 50, "num_candidates": 200
        }}
      ],
      "rank_window_size": 100,
      "rank_constant": 60
    }
  },
  "size": 10
}

매 BM25 tuning (매 corpus 별 k1/b)

# 매 short corpus (titles): k1=1.2, b=0.5  (매 length penalty 약하게)
# 매 long docs (articles):  k1=1.5, b=0.75 (매 default)
# 매 code search:           k1=2.0, b=0.0  (매 length 무관)
# 매 grid search 매 NDCG@10 으로 tune

from rank_bm25 import BM25Okapi
import numpy as np

def grid_search(corpus, queries, judgments):
    best = (None, -1)
    for k1 in [0.8, 1.0, 1.2, 1.5, 2.0]:
        for b in [0.0, 0.25, 0.5, 0.75, 1.0]:
            bm25 = BM25Okapi(corpus, k1=k1, b=b)
            ndcg = evaluate(bm25, queries, judgments)
            if ndcg > best[1]:
                best = ((k1, b), ndcg)
    return best

매 cross-encoder rerank (Cohere v3.5)

import cohere
co = cohere.ClientV2()

# 매 stage 1: hybrid retrieve top 50
candidates = hybrid_search(query, k=50)

# 매 stage 2: rerank to top 10
resp = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=[c.text for c in candidates],
    top_n=10,
)
top10 = [candidates[r.index] for r in resp.results]

매 HyDE (Hypothetical Document Embedding)

import anthropic
client = anthropic.Anthropic()

def hyde_query(question: str) -> str:
    """매 question 을 hypothetical answer 로 변환 → 매 그것 을 embed."""
    msg = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content":
            f"Write a 3-sentence hypothetical answer to: {question}"}],
    )
    return msg.content[0].text

# 매 query embedding 의 quality 향상 — 매 query-doc length asymmetry 완화
hypothetical = hyde_query("how does pgvector handle 1024-dim embeddings?")
emb = embed(hypothetical)
results = vector_search(emb)

매 multi-query expansion (매 LLM)

def expand_query(q: str) -> list[str]:
    msg = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content":
            f"Generate 3 alternative phrasings for search:\n{q}\n"
            "Return one per line."}],
    )
    return [q] + msg.content[0].text.splitlines()

# 매 매 phrasing 으로 search → RRF merge
queries = expand_query("how to ship a model fast")
all_hits = [search(q) for q in queries]
final = rrf_merge(all_hits)

매 pgvector hybrid (Postgres 17)

-- 매 BM25 (pg_search ext) + vector hybrid
WITH lexical AS (
  SELECT id, paradedb.score(id) AS s
  FROM docs
  WHERE id @@@ 'description:earbuds'
  ORDER BY s DESC LIMIT 50
),
semantic AS (
  SELECT id, 1 - (embedding <=> $1::vector) AS s
  FROM docs
  ORDER BY embedding <=> $1::vector LIMIT 50
)
SELECT id,
       COALESCE(1.0/(60 + l.rk), 0) + COALESCE(1.0/(60 + s.rk), 0) AS rrf_score
FROM (SELECT id, ROW_NUMBER() OVER (ORDER BY s DESC) rk FROM lexical) l
FULL OUTER JOIN
     (SELECT id, ROW_NUMBER() OVER (ORDER BY s DESC) rk FROM semantic) s
USING (id)
ORDER BY rrf_score DESC LIMIT 10;

매 evaluation (NDCG@10, 매 judgment list)

import numpy as np

def dcg(rels):
    return sum(r / np.log2(i + 2) for i, r in enumerate(rels))

def ndcg(predicted_ids, judgments, k=10):
    rels = [judgments.get(pid, 0) for pid in predicted_ids[:k]]
    ideal = sorted(judgments.values(), reverse=True)[:k]
    return dcg(rels) / dcg(ideal) if dcg(ideal) > 0 else 0

매 결정 기준

상황 Approach
매 keyword-heavy (code, IDs) BM25 dominant, vector secondary
매 semantic (NL question) vector dominant + BM25 floor
매 mixed (e-commerce) hybrid RRF + cross-encoder rerank
매 high-precision top-3 hybrid → cross-encoder rerank
매 query 가 짧음/모호 LLM expand + HyDE
매 latency-critical (<50ms) BM25 only or pre-computed embeddings

기본값: hybrid (BM25 + dense) + Cohere rerank-v3.5 top-10 + LLM query expansion 옵션.

🔗 Graph

🤖 LLM 활용

언제: 매 query expansion, HyDE, query rewrite. 매 reranker prompt-style. 매 result summarization (RAG). 언제 X: 매 retrieval 자체 — 매 vector + BM25 가 더 cheap/fast. 매 LLM-as-retriever 의 latency 비합리.

안티패턴

  • Vector-only search: 매 exact term (UUID, error code) 매 miss.
  • No reranker: 매 top-50 retrieval 의 noise → top-10 quality 저하.
  • Default BM25 params: 매 corpus 매 다름 — 매 tune.
  • No eval set: 매 judgment 없이 tune → 매 vibe-driven.
  • Embedding drift: 매 model upgrade 시 reindex 안 함.

🧪 검증 / 중복

  • Verified (Robertson & Zaragoza "BM25 and Beyond" 2009, BEIR benchmark, Cohere/Anthropic 2026 docs, Pinecone "Hybrid Search").
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — BM25 + vector hybrid + RRF + Cohere rerank-v3.5 + HyDE