--- id: ai-hybrid-search-patterns title: Hybrid Search — vector + BM25 + rerank category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [ai, search, rag, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend", "AI"] } applied_in: [] aliases: [hybrid search, BM25, vector search, rerank, RRF, reciprocal rank fusion, sparse, dense] --- # Hybrid Search > Vector 만 = 의미 OK, 정확 keyword 약함. **Vector (dense) + BM25 (sparse) + reranker** 조합 — 가장 robust. RRF / weighted / cross-encoder. ## 📖 핵심 개념 - Sparse (BM25): 단어 매칭 — 정확. - Dense (vector): 의미 매칭 — 동의어. - Hybrid: 둘 다. RRF 또는 weighted. - Reranker: top-K 후 LLM / cross-encoder 가 다시 정렬. ## 💻 코드 패턴 ### BM25 (단순 keyword) ```ts // elasticlunr / lunr / minisearch / TS-native import MiniSearch from 'minisearch'; const ms = new MiniSearch({ fields: ['title', 'body'], storeFields: ['id'], }); ms.addAll(documents); const results = ms.search('user authentication'); ``` → Stem + tf-idf + BM25 score. ### Vector (Postgres pgvector) ```sql CREATE TABLE docs ( id text PRIMARY KEY, text text, embedding vector(1536) ); CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops); ``` ```ts const queryEmb = await embed(query); const r = await sql` SELECT id, text, 1 - (embedding <=> ${queryEmb}) AS score FROM docs ORDER BY embedding <=> ${queryEmb} LIMIT 50 `; ``` ### Hybrid (RRF — Reciprocal Rank Fusion) ```ts function rrf( ranked: T[][], k: number = 60 ): T[] { const scores = new Map(); const docs = new Map(); for (const list of ranked) { list.forEach((doc, rank) => { scores.set(doc.id, (scores.get(doc.id) ?? 0) + 1 / (k + rank + 1)); docs.set(doc.id, doc); }); } return [...scores.entries()] .sort((a, b) => b[1] - a[1]) .map(([id]) => docs.get(id)!); } // 사용 const bm25Results = await bm25Search(q, 50); const vecResults = await vectorSearch(q, 50); const fused = rrf([bm25Results, vecResults]).slice(0, 20); ``` → Rank 기반 → score scale 다름 OK. ### Weighted hybrid (score 직접 합) ```ts function weighted(bm25: ScoredDoc[], vec: ScoredDoc[], alpha: number = 0.5) { // Normalize scores [0, 1] const normBM = normalize(bm25); const normVec = normalize(vec); const merged = new Map(); for (const d of normBM) merged.set(d.id, (merged.get(d.id) ?? 0) + (1 - alpha) * d.score); for (const d of normVec) merged.set(d.id, (merged.get(d.id) ?? 0) + alpha * d.score); return [...merged.entries()].sort((a, b) => b[1] - a[1]); } ``` → Alpha tuning. 0.5 가 default. ### Postgres hybrid ```sql WITH bm25 AS ( SELECT id, ts_rank(tsv, query) AS score FROM docs, plainto_tsquery('english', $1) query WHERE tsv @@ query ORDER BY score DESC LIMIT 50 ), vec AS ( SELECT id, 1 - (embedding <=> $2) AS score FROM docs ORDER BY embedding <=> $2 LIMIT 50 ) SELECT id, COALESCE(bm25.score, 0) * 0.4 + COALESCE(vec.score, 0) * 0.6 AS score FROM bm25 FULL OUTER JOIN vec USING (id) ORDER BY score DESC LIMIT 20; ``` ### Reranker (cross-encoder) ```python from sentence_transformers import CrossEncoder reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') candidates = hybrid_search(query, k=50) pairs = [(query, d.text) for d in candidates] scores = reranker.predict(pairs) reranked = sorted(zip(candidates, scores), key=lambda x: -x[1])[:10] ``` → Cross-encoder = 정밀 (큰 cost). Top-50 → top-10. ### Cohere rerank API ```ts import { CohereClient } from 'cohere-ai'; const cohere = new CohereClient({ token }); const r = await cohere.rerank({ query, documents: candidates.map(c => c.text), topN: 10, model: 'rerank-english-v3.0', }); ``` → Managed reranker. ### LLM rerank (작은 model) ```ts const prompt = ` Rate each document's relevance to the query (0-10). Query: ${query} ${candidates.map((c, i) => `[${i}] ${c.text}`).join('\n\n')} Output JSON: {"scores": [...]} `; const r = await llm.complete({ prompt, model: 'haiku' }); const { scores } = JSON.parse(r.text); const reranked = candidates.map((c, i) => ({ ...c, score: scores[i] })) .sort((a, b) => b.score - a.score); ``` → 작은 LLM (haiku, gpt-4o-mini) 가 cheap rerank. ### Query expansion ```ts // LLM 가 query 확장 const expanded = await llm.complete({ prompt: `Generate 3 alternative phrasings: "${query}"`, }); const queries = [query, ...expanded.split('\n')]; // 각 query 검색 + 합치기 const all = await Promise.all(queries.map(q => search(q, 20))); const fused = rrf(all); ``` → "user signin" → "login" / "auth" / "sign in". ### HyDE (Hypothetical Document Embedding) ```ts // LLM 가 가짜 답 생성 → embed → 검색 const hypothetical = await llm.complete({ prompt: `Generate a detailed answer for: ${query}`, }); const emb = await embed(hypothetical); const results = await vectorSearch(emb, 20); ``` → 실제 답 vs 가짜 답 — 의미 가까우니 검색 좋음. ### Multi-vector (1 doc → 여러 embedding) ```ts // Section 별 / sentence 별 embed const sections = doc.split(/\n\n/); const embeds = await Promise.all(sections.map(s => embed(s))); embeds.forEach((emb, i) => sql`INSERT INTO chunks (doc_id, idx, text, emb) VALUES (${doc.id}, ${i}, ${sections[i]}, ${emb})`); ``` → Doc 의 1 section 가 hit → 그 doc 가 결과. ### Fusion in RAG pipeline ``` Query ├→ BM25 (sparse) top-50 ├→ Vector (dense) top-50 ├→ Optional: HyDE → vector top-50 └→ RRF fuse → top-20 └→ Reranker → top-5 └→ LLM context ``` ### Filtering (metadata) ```sql SELECT * FROM docs WHERE category = 'engineering' AND created_at > '2026-01-01' ORDER BY embedding <=> $1 LIMIT 20; ``` → Vector + filter (pre-filter or post). ### Date / source weight ```ts function dateBoost(score: number, daysOld: number): number { const decay = Math.exp(-daysOld / 365); return score * (0.5 + 0.5 * decay); } ``` → 최신 doc 우대. ### A/B test ```ts // 사용자 query → 두 시스템 const A = await search(q, 10); const B = await searchHybrid(q, 10); // CTR / dwell time / 만족도 비교 log({ user, q, A_clicked: ..., B_clicked: ... }); ``` ### MTEB benchmark ``` 모델 의 quality 비교: - BGE / e5 / Cohere embed-v3 / text-embedding-3 / Voyage → MTEB leaderboard 참고. ``` ### Search-as-a-service ``` - Algolia: managed BM25 + vector hybrid - Typesense: open source - Meilisearch: simple - Vespa: 가장 강력 + 복잡 - Weaviate: vector + hybrid - Pinecone + reranker - Elastic: BM25 + dense ``` ### LLM 친화 답 ```ts const prompt = ` Answer based ONLY on context. Cite [1], [2]. Context: [1] ${docs[0].text} [2] ${docs[1].text} Question: ${query} Answer: `; ``` → Hybrid + rerank 가 큰 noise 제거. ### Eval ```python # Recall@K def recall_at_k(predicted, relevant, k): return len(set(predicted[:k]) & set(relevant)) / len(relevant) # MRR (Mean Reciprocal Rank) def mrr(predictions, relevant): for i, p in enumerate(predictions): if p in relevant: return 1 / (i + 1) return 0 # nDCG (가장 표준) ``` ### Cost ``` BM25: cheap (in-DB). Vector: $$ (embedding + index). Reranker: $$$ per call. → 적게 retrieve (top-10) + rerank. ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | 작은 / 단순 search | BM25 만 | | 의미 / 동의어 중요 | Vector | | 일반 production | Hybrid (RRF) | | 정확도 최우선 | Hybrid + rerank | | Long-form Q&A | HyDE + hybrid + rerank | | Real-time | BM25 + cache | | Code search | BM25 + vector + filter (lang) | ## ❌ 안티패턴 - **Vector 만 사용**: keyword 정확 약함 (UUID, 코드). - **BM25 만 사용**: 의미 잃음 (login = signin). - **모든 거 rerank**: cost 폭발 — top-50 만. - **Score 정규화 안 함**: weighted 의미 X. - **Chunk 없이 큰 doc**: 검색 약함. - **Filter 후처리**: 효율 X. - **Eval 없음**: tune 못 함. ## 🤖 LLM 활용 힌트 - RRF 가 score scale 무관 simple. - Reranker (cross-encoder / Cohere) = 큰 quality jump. - HyDE 가 trivial Q→A gap 닫음. - BM25 + Vector + Rerank = canonical. ## 🔗 관련 문서 - [[AI_RAG_Advanced]] - [[DB_pgvector_Production]] - [[DB_Full_Text_Search]]