Files
2nd/10_Wiki/Topics/Coding/AI_Hybrid_Search_Patterns.md
T
2026-05-09 22:47:42 +09:00

8.2 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-hybrid-search-patterns Hybrid Search — vector + BM25 + rerank Coding draft B conceptual 2026-05-09 2026-05-09
ai
search
rag
vibe-coding
language applicable_to
TS / Python
Backend
AI
hybrid search
BM25
vector search
rerank
RRF
reciprocal rank fusion
sparse
dense

Hybrid Search

Vector 만 = 의미 OK, 정확 keyword 약함. Vector (dense) + BM25 (sparse) + reranker 조합 — 가장 robust. RRF / weighted / cross-encoder.

📖 핵심 개념

  • Sparse (BM25): 단어 매칭 — 정확.
  • Dense (vector): 의미 매칭 — 동의어.
  • Hybrid: 둘 다. RRF 또는 weighted.
  • Reranker: top-K 후 LLM / cross-encoder 가 다시 정렬.

💻 코드 패턴

BM25 (단순 keyword)

// elasticlunr / lunr / minisearch / TS-native
import MiniSearch from 'minisearch';

const ms = new MiniSearch({
  fields: ['title', 'body'],
  storeFields: ['id'],
});

ms.addAll(documents);
const results = ms.search('user authentication');

→ Stem + tf-idf + BM25 score.

Vector (Postgres pgvector)

CREATE TABLE docs (
  id text PRIMARY KEY,
  text text,
  embedding vector(1536)
);

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
const queryEmb = await embed(query);
const r = await sql`
  SELECT id, text, 1 - (embedding <=> ${queryEmb}) AS score
  FROM docs
  ORDER BY embedding <=> ${queryEmb}
  LIMIT 50
`;

Hybrid (RRF — Reciprocal Rank Fusion)

function rrf<T extends { id: string }>(
  ranked: T[][],
  k: number = 60
): T[] {
  const scores = new Map<string, number>();
  const docs = new Map<string, T>();
  
  for (const list of ranked) {
    list.forEach((doc, rank) => {
      scores.set(doc.id, (scores.get(doc.id) ?? 0) + 1 / (k + rank + 1));
      docs.set(doc.id, doc);
    });
  }
  
  return [...scores.entries()]
    .sort((a, b) => b[1] - a[1])
    .map(([id]) => docs.get(id)!);
}

// 사용
const bm25Results = await bm25Search(q, 50);
const vecResults = await vectorSearch(q, 50);
const fused = rrf([bm25Results, vecResults]).slice(0, 20);

→ Rank 기반 → score scale 다름 OK.

Weighted hybrid (score 직접 합)

function weighted(bm25: ScoredDoc[], vec: ScoredDoc[], alpha: number = 0.5) {
  // Normalize scores [0, 1]
  const normBM = normalize(bm25);
  const normVec = normalize(vec);
  
  const merged = new Map<string, number>();
  for (const d of normBM) merged.set(d.id, (merged.get(d.id) ?? 0) + (1 - alpha) * d.score);
  for (const d of normVec) merged.set(d.id, (merged.get(d.id) ?? 0) + alpha * d.score);
  
  return [...merged.entries()].sort((a, b) => b[1] - a[1]);
}

→ Alpha tuning. 0.5 가 default.

Postgres hybrid

WITH bm25 AS (
  SELECT id, ts_rank(tsv, query) AS score
  FROM docs, plainto_tsquery('english', $1) query
  WHERE tsv @@ query
  ORDER BY score DESC LIMIT 50
),
vec AS (
  SELECT id, 1 - (embedding <=> $2) AS score
  FROM docs
  ORDER BY embedding <=> $2 LIMIT 50
)
SELECT id, COALESCE(bm25.score, 0) * 0.4 + COALESCE(vec.score, 0) * 0.6 AS score
FROM bm25 FULL OUTER JOIN vec USING (id)
ORDER BY score DESC LIMIT 20;

Reranker (cross-encoder)

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

candidates = hybrid_search(query, k=50)
pairs = [(query, d.text) for d in candidates]
scores = reranker.predict(pairs)

reranked = sorted(zip(candidates, scores), key=lambda x: -x[1])[:10]

→ Cross-encoder = 정밀 (큰 cost). Top-50 → top-10.

Cohere rerank API

import { CohereClient } from 'cohere-ai';
const cohere = new CohereClient({ token });

const r = await cohere.rerank({
  query,
  documents: candidates.map(c => c.text),
  topN: 10,
  model: 'rerank-english-v3.0',
});

→ Managed reranker.

LLM rerank (작은 model)

const prompt = `
Rate each document's relevance to the query (0-10).

Query: ${query}

${candidates.map((c, i) => `[${i}] ${c.text}`).join('\n\n')}

Output JSON: {"scores": [...]}
`;

const r = await llm.complete({ prompt, model: 'haiku' });
const { scores } = JSON.parse(r.text);
const reranked = candidates.map((c, i) => ({ ...c, score: scores[i] }))
  .sort((a, b) => b.score - a.score);

→ 작은 LLM (haiku, gpt-4o-mini) 가 cheap rerank.

Query expansion

// LLM 가 query 확장
const expanded = await llm.complete({
  prompt: `Generate 3 alternative phrasings: "${query}"`,
});
const queries = [query, ...expanded.split('\n')];

// 각 query 검색 + 합치기
const all = await Promise.all(queries.map(q => search(q, 20)));
const fused = rrf(all);

→ "user signin" → "login" / "auth" / "sign in".

HyDE (Hypothetical Document Embedding)

// LLM 가 가짜 답 생성 → embed → 검색
const hypothetical = await llm.complete({
  prompt: `Generate a detailed answer for: ${query}`,
});
const emb = await embed(hypothetical);
const results = await vectorSearch(emb, 20);

→ 실제 답 vs 가짜 답 — 의미 가까우니 검색 좋음.

Multi-vector (1 doc → 여러 embedding)

// Section 별 / sentence 별 embed
const sections = doc.split(/\n\n/);
const embeds = await Promise.all(sections.map(s => embed(s)));
embeds.forEach((emb, i) => sql`INSERT INTO chunks (doc_id, idx, text, emb) VALUES (${doc.id}, ${i}, ${sections[i]}, ${emb})`);

→ Doc 의 1 section 가 hit → 그 doc 가 결과.

Fusion in RAG pipeline

Query
  ├→ BM25 (sparse) top-50
  ├→ Vector (dense) top-50
  ├→ Optional: HyDE → vector top-50
  └→ RRF fuse → top-20
       └→ Reranker → top-5
            └→ LLM context

Filtering (metadata)

SELECT * FROM docs
WHERE category = 'engineering'
  AND created_at > '2026-01-01'
ORDER BY embedding <=> $1
LIMIT 20;

→ Vector + filter (pre-filter or post).

Date / source weight

function dateBoost(score: number, daysOld: number): number {
  const decay = Math.exp(-daysOld / 365);
  return score * (0.5 + 0.5 * decay);
}

→ 최신 doc 우대.

A/B test

// 사용자 query → 두 시스템
const A = await search(q, 10);
const B = await searchHybrid(q, 10);

// CTR / dwell time / 만족도 비교
log({ user, q, A_clicked: ..., B_clicked: ... });

MTEB benchmark

모델 의 quality 비교:
- BGE / e5 / Cohere embed-v3 / text-embedding-3 / Voyage

→ MTEB leaderboard 참고.

Search-as-a-service

- Algolia: managed BM25 + vector hybrid
- Typesense: open source
- Meilisearch: simple
- Vespa: 가장 강력 + 복잡
- Weaviate: vector + hybrid
- Pinecone + reranker
- Elastic: BM25 + dense

LLM 친화 답

const prompt = `
Answer based ONLY on context. Cite [1], [2].

Context:
[1] ${docs[0].text}
[2] ${docs[1].text}

Question: ${query}

Answer:
`;

→ Hybrid + rerank 가 큰 noise 제거.

Eval

# Recall@K
def recall_at_k(predicted, relevant, k):
    return len(set(predicted[:k]) & set(relevant)) / len(relevant)

# MRR (Mean Reciprocal Rank)
def mrr(predictions, relevant):
    for i, p in enumerate(predictions):
        if p in relevant:
            return 1 / (i + 1)
    return 0

# nDCG (가장 표준)

Cost

BM25: cheap (in-DB).
Vector: $$ (embedding + index).
Reranker: $$$ per call.

→ 적게 retrieve (top-10) + rerank.

🤔 의사결정 기준

상황 추천
작은 / 단순 search BM25 만
의미 / 동의어 중요 Vector
일반 production Hybrid (RRF)
정확도 최우선 Hybrid + rerank
Long-form Q&A HyDE + hybrid + rerank
Real-time BM25 + cache
Code search BM25 + vector + filter (lang)

안티패턴

  • Vector 만 사용: keyword 정확 약함 (UUID, 코드).
  • BM25 만 사용: 의미 잃음 (login = signin).
  • 모든 거 rerank: cost 폭발 — top-50 만.
  • Score 정규화 안 함: weighted 의미 X.
  • Chunk 없이 큰 doc: 검색 약함.
  • Filter 후처리: 효율 X.
  • Eval 없음: tune 못 함.

🤖 LLM 활용 힌트

  • RRF 가 score scale 무관 simple.
  • Reranker (cross-encoder / Cohere) = 큰 quality jump.
  • HyDE 가 trivial Q→A gap 닫음.
  • BM25 + Vector + Rerank = canonical.

🔗 관련 문서