--- id: wiki-2026-0508-search-optimization title: Search Optimization category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Search Tuning, Retrieval Optimization, Hybrid Search] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [search, retrieval, bm25, vector, hybrid, rag] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: Elasticsearch + pgvector --- # Search Optimization ## 매 한 줄 > **"매 search 의 quality 는 매 lexical(BM25) + semantic(vector) hybrid + reranker 의 stack — 매 single signal 의 X"**. 매 origin 은 1970s tf-idf, 1994 BM25 (Robertson); 매 modern state 는 BM25F + dense vector (ColBERT/E5/Cohere v3.5) + cross-encoder rerank, 매 RAG 의 retrieval layer. ## 매 핵심 ### 매 search stack (매 2026 modern) - **Lexical**: BM25 (Elasticsearch, OpenSearch, Tantivy) — 매 exact term, rare token, code. - **Dense vector**: bi-encoder (E5-large, Cohere embed-v3.5, OpenAI 3-large) — 매 semantic match. - **Sparse-learned**: SPLADE — 매 lexical + learned weight. - **Hybrid fusion**: RRF (Reciprocal Rank Fusion) or weighted score sum. - **Reranker**: cross-encoder (Cohere rerank-3.5, BGE-reranker-v2) — 매 top-50 → top-10. - **Query understanding**: LLM rewrite, HyDE, multi-query expansion. ### 매 응용 1. Site search (e-commerce, docs). 2. RAG retrieval. 3. Code search (GitHub). 4. Internal knowledge search. ## 💻 패턴 ### 매 BM25 (Elasticsearch 9, 매 tuned) ```json PUT /products { "settings": { "similarity": { "default": { "type": "BM25", "k1": 1.2, "b": 0.75 } } }, "mappings": { "properties": { "title": { "type": "text", "boost": 3.0 }, "description": { "type": "text" }, "tags": { "type": "keyword" }, "embedding": { "type": "dense_vector", "dims": 1024, "similarity": "cosine" } } } } ``` ### 매 hybrid query (RRF, ES 9 native) ```json GET /products/_search { "retriever": { "rrf": { "retrievers": [ { "standard": { "query": { "multi_match": { "query": "wireless earbuds noise cancel", "fields": ["title^3", "description"] }} }}, { "knn": { "field": "embedding", "query_vector_builder": { "text_embedding": { "model_id": "cohere-embed-v3-5", "model_text": "wireless earbuds noise cancel" } }, "k": 50, "num_candidates": 200 }} ], "rank_window_size": 100, "rank_constant": 60 } }, "size": 10 } ``` ### 매 BM25 tuning (매 corpus 별 k1/b) ```python # 매 short corpus (titles): k1=1.2, b=0.5 (매 length penalty 약하게) # 매 long docs (articles): k1=1.5, b=0.75 (매 default) # 매 code search: k1=2.0, b=0.0 (매 length 무관) # 매 grid search 매 NDCG@10 으로 tune from rank_bm25 import BM25Okapi import numpy as np def grid_search(corpus, queries, judgments): best = (None, -1) for k1 in [0.8, 1.0, 1.2, 1.5, 2.0]: for b in [0.0, 0.25, 0.5, 0.75, 1.0]: bm25 = BM25Okapi(corpus, k1=k1, b=b) ndcg = evaluate(bm25, queries, judgments) if ndcg > best[1]: best = ((k1, b), ndcg) return best ``` ### 매 cross-encoder rerank (Cohere v3.5) ```python import cohere co = cohere.ClientV2() # 매 stage 1: hybrid retrieve top 50 candidates = hybrid_search(query, k=50) # 매 stage 2: rerank to top 10 resp = co.rerank( model="rerank-v3.5", query=query, documents=[c.text for c in candidates], top_n=10, ) top10 = [candidates[r.index] for r in resp.results] ``` ### 매 HyDE (Hypothetical Document Embedding) ```python import anthropic client = anthropic.Anthropic() def hyde_query(question: str) -> str: """매 question 을 hypothetical answer 로 변환 → 매 그것 을 embed.""" msg = client.messages.create( model="claude-haiku-4-5", max_tokens=256, messages=[{"role": "user", "content": f"Write a 3-sentence hypothetical answer to: {question}"}], ) return msg.content[0].text # 매 query embedding 의 quality 향상 — 매 query-doc length asymmetry 완화 hypothetical = hyde_query("how does pgvector handle 1024-dim embeddings?") emb = embed(hypothetical) results = vector_search(emb) ``` ### 매 multi-query expansion (매 LLM) ```python def expand_query(q: str) -> list[str]: msg = client.messages.create( model="claude-haiku-4-5", max_tokens=256, messages=[{"role": "user", "content": f"Generate 3 alternative phrasings for search:\n{q}\n" "Return one per line."}], ) return [q] + msg.content[0].text.splitlines() # 매 매 phrasing 으로 search → RRF merge queries = expand_query("how to ship a model fast") all_hits = [search(q) for q in queries] final = rrf_merge(all_hits) ``` ### 매 pgvector hybrid (Postgres 17) ```sql -- 매 BM25 (pg_search ext) + vector hybrid WITH lexical AS ( SELECT id, paradedb.score(id) AS s FROM docs WHERE id @@@ 'description:earbuds' ORDER BY s DESC LIMIT 50 ), semantic AS ( SELECT id, 1 - (embedding <=> $1::vector) AS s FROM docs ORDER BY embedding <=> $1::vector LIMIT 50 ) SELECT id, COALESCE(1.0/(60 + l.rk), 0) + COALESCE(1.0/(60 + s.rk), 0) AS rrf_score FROM (SELECT id, ROW_NUMBER() OVER (ORDER BY s DESC) rk FROM lexical) l FULL OUTER JOIN (SELECT id, ROW_NUMBER() OVER (ORDER BY s DESC) rk FROM semantic) s USING (id) ORDER BY rrf_score DESC LIMIT 10; ``` ### 매 evaluation (NDCG@10, 매 judgment list) ```python import numpy as np def dcg(rels): return sum(r / np.log2(i + 2) for i, r in enumerate(rels)) def ndcg(predicted_ids, judgments, k=10): rels = [judgments.get(pid, 0) for pid in predicted_ids[:k]] ideal = sorted(judgments.values(), reverse=True)[:k] return dcg(rels) / dcg(ideal) if dcg(ideal) > 0 else 0 ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 keyword-heavy (code, IDs) | BM25 dominant, vector secondary | | 매 semantic (NL question) | vector dominant + BM25 floor | | 매 mixed (e-commerce) | hybrid RRF + cross-encoder rerank | | 매 high-precision top-3 | hybrid → cross-encoder rerank | | 매 query 가 짧음/모호 | LLM expand + HyDE | | 매 latency-critical (<50ms) | BM25 only or pre-computed embeddings | **기본값**: hybrid (BM25 + dense) + Cohere rerank-v3.5 top-10 + LLM query expansion 옵션. ## 🔗 Graph - 부모: [[Information Retrieval]] · [[RAG]] - 변형: [[BM25]] · [[Vector Search]] · [[Information-Retrieval-IR|Hybrid Search]] · [[Reranker]] - 응용: [[Semantic Search]] - Adjacent: [[Embeddings]] · [[ColBERT]] ## 🤖 LLM 활용 **언제**: 매 query expansion, HyDE, query rewrite. 매 reranker prompt-style. 매 result summarization (RAG). **언제 X**: 매 retrieval 자체 — 매 vector + BM25 가 더 cheap/fast. 매 LLM-as-retriever 의 latency 비합리. ## ❌ 안티패턴 - **Vector-only search**: 매 exact term (UUID, error code) 매 miss. - **No reranker**: 매 top-50 retrieval 의 noise → top-10 quality 저하. - **Default BM25 params**: 매 corpus 매 다름 — 매 tune. - **No eval set**: 매 judgment 없이 tune → 매 vibe-driven. - **Embedding drift**: 매 model upgrade 시 reindex 안 함. ## 🧪 검증 / 중복 - Verified (Robertson & Zaragoza "BM25 and Beyond" 2009, BEIR benchmark, Cohere/Anthropic 2026 docs, Pinecone "Hybrid Search"). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — BM25 + vector hybrid + RRF + Cohere rerank-v3.5 + HyDE |