--- id: ai-rag-advanced title: RAG Advanced — Hybrid / Rerank / Multi-modal / Graph category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [ai, rag, advanced, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend"] } applied_in: [] aliases: [hybrid search, reranking, GraphRAG, multi-vector, contextual retrieval, query expansion] --- # RAG Advanced > 단순 vector 만 = 한계. **Hybrid (vector + BM25), reranker, query rewrite, contextual chunking, GraphRAG**. Anthropic Contextual Retrieval = 49% 정확도 향상. ## 📖 핵심 개념 - Hybrid: vector + keyword 가중치. - Reranker: top-K 다시 정렬 (cross-encoder). - Query expansion: 짧은 query 풍부화. - Contextual chunking: chunk 에 문서 context 첨부. - GraphRAG: entity / relationship 그래프. ## 💻 코드 패턴 ### Hybrid search ```ts async function hybridSearch(query: string, k = 20): Promise { const [vectorHits, keywordHits] = await Promise.all([ vectorSearch(query, k * 2), bm25Search(query, k * 2), ]); // RRF (Reciprocal Rank Fusion) const scores = new Map(); vectorHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i))); keywordHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i))); const merged = [...new Set([...vectorHits, ...keywordHits].map(c => c.id))]; return merged .map(id => ({ ...findChunk(id), score: scores.get(id)! })) .sort((a, b) => b.score - a.score) .slice(0, k); } ``` ### Reranker (Cohere / Voyage / Jina) ```ts import { CohereClient } from 'cohere-ai'; const cohere = new CohereClient({ token }); async function rerank(query: string, chunks: Chunk[], topK = 5): Promise { const r = await cohere.rerank({ model: 'rerank-multilingual-v3.0', query, documents: chunks.map(c => c.content), topN: topK, }); return r.results.map(res => chunks[res.index]); } // Pipeline const candidates = await hybridSearch(query, 50); // 50 후보 const top = await rerank(query, candidates, 5); // 5 정밀 ``` ### Contextual Retrieval (Anthropic) ```ts // 각 chunk 에 문서 context 추가 async function buildContextualChunks(doc: Document) { const chunks = chunkText(doc.content, 1000); return await Promise.all(chunks.map(async (chunk, i) => { const context = await llm.complete({ system: 'Provide 2-3 sentences of context for this chunk in the document.', user: `Document: ${doc.title}\n\nChunk: ${chunk}`, }); return { content: context + '\n\n' + chunk, embedding: await embed(context + '\n\n' + chunk), docId: doc.id, chunkIdx: i, }; })); } ``` → 답이 chunk 안 있는데도 retrieval 정확도 ↑. ### Query rewrite / expansion ```ts async function expandQuery(query: string): Promise { const r = await llm.complete({ system: 'Generate 3 alternative phrasings of the query for search. Output JSON array.', user: query, response_format: { type: 'json_object' }, }); return [query, ...JSON.parse(r).queries]; } // 각 query 검색 → 결과 합치기 (RRF) const queries = await expandQuery(userQuery); const allHits = await Promise.all(queries.map(q => hybridSearch(q))); const merged = mergeRRF(allHits); ``` ### HyDE (Hypothetical Document Embeddings) ```ts // 가상 답을 먼저 생성 → 그 답의 embedding 으로 검색 async function hyde(query: string): Promise { const hypoAnswer = await llm.complete({ system: 'Write a concise hypothetical paragraph that would answer the question.', user: query, }); const queryEmb = await embed(hypoAnswer); return await vectorSearch(queryEmb, 10); } ``` → Query 가 짧을 때 효과. ### Multi-vector (different aspects) ```ts // Document 한 개 = 여러 embedding (제목, 요약, 본문) const chunks = [ { type: 'title', content: doc.title, embedding: await embed(doc.title) }, { type: 'summary', content: doc.summary, embedding: await embed(doc.summary) }, ...sectionChunks, ]; // Query 에 적합한 type 가중치 ``` ### Late-interaction (ColBERT) ``` Query 와 document 를 token 단위 embedding. 각 query token 의 max score with document tokens. → 정확하지만 비싸. ``` → Vespa / 자체 ColBERT. ### GraphRAG (Microsoft) ``` 1. 문서 → entity / relationship 추출 (LLM) 2. 그래프 빌드 3. Community detection (entities cluster) 4. 각 community 의 summary 미리 생성 5. Query → community summary + 관련 entity 직접 ``` ```ts async function extractEntities(chunk: string) { const r = await llm.complete({ system: 'Extract entities and relationships as JSON.', user: chunk, response_format: { type: 'json_object' }, }); return JSON.parse(r); // { entities: [...], relationships: [...] } } ``` → 큰 문서 모음 + multi-hop reasoning 에 강. ### Multi-modal RAG ```ts // 이미지 + 텍스트 → 같은 vector space (CLIP / Voyage Multimodal) const queryEmb = await embedMultimodal({ text: query }); const r = await vectorSearch(queryEmb); // 결과 = 텍스트 또는 이미지 chunks // LLM 에 이미지 + 텍스트 같이 const answer = await llm.chat({ messages: [{ role: 'user', content: [ { type: 'text', text: 'Based on the context...' }, ...imageChunks.map(c => ({ type: 'image', source: c.url })), ...textChunks.map(c => ({ type: 'text', text: c.content })), ], }], }); ``` ### Self-RAG (model 가 retrieve 결정) ``` LLM 이 답하면서 "내가 더 정보 필요" 결정 → 검색 → 다시 답. Function calling 으로 구현. ``` ```ts const tools = [{ name: 'search_kb', description: 'Search internal knowledge base', input_schema: { ... }, }]; // Loop — LLM 이 tool 사용 결정 ``` ### Citation + verification ```ts // 답 + 출처 + 인용 검증 const answer = await llm.complete({ system: `Answer using context. After each claim, cite [chunk_N]. End with "VERIFIED" or "UNVERIFIED".`, user: `Context:\n${contextWithIds}\n\nQ: ${query}`, }); // Optional: 두 번째 LLM 가 답 ↔ context 검증 const ok = await verifyClaim(answer, retrievedChunks); ``` ### Eval — Ragas ```python from ragas import evaluate from ragas.metrics import context_recall, context_precision, faithfulness, answer_relevancy result = evaluate( dataset, metrics=[context_recall, context_precision, faithfulness, answer_relevancy], ) ``` ### Pipeline 정리 ``` 1. Query → expand / HyDE / rewrite 2. Hybrid search (vector + BM25) → top-50 3. Rerank → top-5 4. Build context (with citations) 5. LLM answer (force citations) 6. Optional: verify ``` ## 🤔 의사결정 기준 | 상황 | 기법 | |---|---| | 짧은 query | HyDE / expand | | 대형 corpus + multi-hop | GraphRAG | | 정확도 strict | Hybrid + rerank | | 멀티 lang | Cohere multilingual | | 멀티 modal (image/text) | CLIP / Voyage | | Out-of-corpus 답 | Self-RAG (tool 결정) | ## ❌ 안티패턴 - **Vector only**: keyword 정확 match 약함. - **Top-K 가 큼 (50) + LLM 에 모두**: noise. Rerank 후 5-10. - **Citation 없음**: hallucination 검증 불가. - **Static chunking 만**: contextual 가 더 강. - **Eval 없음**: 어떤 변경이 향상 인지 모름. - **Reranker 모든 query 사용**: latency. cache. - **GraphRAG 작은 corpus**: overkill. simple RAG 충분. ## 🤖 LLM 활용 힌트 - Hybrid + reranker = sweet spot. - Contextual chunks = 큰 향상 (49%). - Citation 강제 system prompt. - Ragas 로 eval. ## 🔗 관련 문서 - [[AI_RAG_Pattern_Basics]] - [[AI_Embeddings_Comparison]] - [[AI_LLM_Eval_Patterns]] - [[DB_pgvector_Production]]