266 lines
7.6 KiB
Markdown
266 lines
7.6 KiB
Markdown
---
|
|
id: ai-rag-advanced
|
|
title: RAG Advanced — Hybrid / Rerank / Multi-modal / Graph
|
|
category: Coding
|
|
status: draft
|
|
source_trust_level: B
|
|
verification_status: conceptual
|
|
created_at: 2026-05-09
|
|
updated_at: 2026-05-09
|
|
tags: [ai, rag, advanced, vibe-coding]
|
|
tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
|
|
applied_in: []
|
|
aliases: [hybrid search, reranking, GraphRAG, multi-vector, contextual retrieval, query expansion]
|
|
---
|
|
|
|
# RAG Advanced
|
|
|
|
> 단순 vector 만 = 한계. **Hybrid (vector + BM25), reranker, query rewrite, contextual chunking, GraphRAG**. Anthropic Contextual Retrieval = 49% 정확도 향상.
|
|
|
|
## 📖 핵심 개념
|
|
- Hybrid: vector + keyword 가중치.
|
|
- Reranker: top-K 다시 정렬 (cross-encoder).
|
|
- Query expansion: 짧은 query 풍부화.
|
|
- Contextual chunking: chunk 에 문서 context 첨부.
|
|
- GraphRAG: entity / relationship 그래프.
|
|
|
|
## 💻 코드 패턴
|
|
|
|
### Hybrid search
|
|
```ts
|
|
async function hybridSearch(query: string, k = 20): Promise<Chunk[]> {
|
|
const [vectorHits, keywordHits] = await Promise.all([
|
|
vectorSearch(query, k * 2),
|
|
bm25Search(query, k * 2),
|
|
]);
|
|
|
|
// RRF (Reciprocal Rank Fusion)
|
|
const scores = new Map<string, number>();
|
|
vectorHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i)));
|
|
keywordHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i)));
|
|
|
|
const merged = [...new Set([...vectorHits, ...keywordHits].map(c => c.id))];
|
|
return merged
|
|
.map(id => ({ ...findChunk(id), score: scores.get(id)! }))
|
|
.sort((a, b) => b.score - a.score)
|
|
.slice(0, k);
|
|
}
|
|
```
|
|
|
|
### Reranker (Cohere / Voyage / Jina)
|
|
```ts
|
|
import { CohereClient } from 'cohere-ai';
|
|
const cohere = new CohereClient({ token });
|
|
|
|
async function rerank(query: string, chunks: Chunk[], topK = 5): Promise<Chunk[]> {
|
|
const r = await cohere.rerank({
|
|
model: 'rerank-multilingual-v3.0',
|
|
query,
|
|
documents: chunks.map(c => c.content),
|
|
topN: topK,
|
|
});
|
|
return r.results.map(res => chunks[res.index]);
|
|
}
|
|
|
|
// Pipeline
|
|
const candidates = await hybridSearch(query, 50); // 50 후보
|
|
const top = await rerank(query, candidates, 5); // 5 정밀
|
|
```
|
|
|
|
### Contextual Retrieval (Anthropic)
|
|
```ts
|
|
// 각 chunk 에 문서 context 추가
|
|
async function buildContextualChunks(doc: Document) {
|
|
const chunks = chunkText(doc.content, 1000);
|
|
return await Promise.all(chunks.map(async (chunk, i) => {
|
|
const context = await llm.complete({
|
|
system: 'Provide 2-3 sentences of context for this chunk in the document.',
|
|
user: `Document: ${doc.title}\n\nChunk: ${chunk}`,
|
|
});
|
|
return {
|
|
content: context + '\n\n' + chunk,
|
|
embedding: await embed(context + '\n\n' + chunk),
|
|
docId: doc.id,
|
|
chunkIdx: i,
|
|
};
|
|
}));
|
|
}
|
|
```
|
|
|
|
→ 답이 chunk 안 있는데도 retrieval 정확도 ↑.
|
|
|
|
### Query rewrite / expansion
|
|
```ts
|
|
async function expandQuery(query: string): Promise<string[]> {
|
|
const r = await llm.complete({
|
|
system: 'Generate 3 alternative phrasings of the query for search. Output JSON array.',
|
|
user: query,
|
|
response_format: { type: 'json_object' },
|
|
});
|
|
return [query, ...JSON.parse(r).queries];
|
|
}
|
|
|
|
// 각 query 검색 → 결과 합치기 (RRF)
|
|
const queries = await expandQuery(userQuery);
|
|
const allHits = await Promise.all(queries.map(q => hybridSearch(q)));
|
|
const merged = mergeRRF(allHits);
|
|
```
|
|
|
|
### HyDE (Hypothetical Document Embeddings)
|
|
```ts
|
|
// 가상 답을 먼저 생성 → 그 답의 embedding 으로 검색
|
|
async function hyde(query: string): Promise<Chunk[]> {
|
|
const hypoAnswer = await llm.complete({
|
|
system: 'Write a concise hypothetical paragraph that would answer the question.',
|
|
user: query,
|
|
});
|
|
const queryEmb = await embed(hypoAnswer);
|
|
return await vectorSearch(queryEmb, 10);
|
|
}
|
|
```
|
|
|
|
→ Query 가 짧을 때 효과.
|
|
|
|
### Multi-vector (different aspects)
|
|
```ts
|
|
// Document 한 개 = 여러 embedding (제목, 요약, 본문)
|
|
const chunks = [
|
|
{ type: 'title', content: doc.title, embedding: await embed(doc.title) },
|
|
{ type: 'summary', content: doc.summary, embedding: await embed(doc.summary) },
|
|
...sectionChunks,
|
|
];
|
|
|
|
// Query 에 적합한 type 가중치
|
|
```
|
|
|
|
### Late-interaction (ColBERT)
|
|
```
|
|
Query 와 document 를 token 단위 embedding.
|
|
각 query token 의 max score with document tokens.
|
|
→ 정확하지만 비싸.
|
|
```
|
|
|
|
→ Vespa / 자체 ColBERT.
|
|
|
|
### GraphRAG (Microsoft)
|
|
```
|
|
1. 문서 → entity / relationship 추출 (LLM)
|
|
2. 그래프 빌드
|
|
3. Community detection (entities cluster)
|
|
4. 각 community 의 summary 미리 생성
|
|
5. Query → community summary + 관련 entity 직접
|
|
```
|
|
|
|
```ts
|
|
async function extractEntities(chunk: string) {
|
|
const r = await llm.complete({
|
|
system: 'Extract entities and relationships as JSON.',
|
|
user: chunk,
|
|
response_format: { type: 'json_object' },
|
|
});
|
|
return JSON.parse(r); // { entities: [...], relationships: [...] }
|
|
}
|
|
```
|
|
|
|
→ 큰 문서 모음 + multi-hop reasoning 에 강.
|
|
|
|
### Multi-modal RAG
|
|
```ts
|
|
// 이미지 + 텍스트 → 같은 vector space (CLIP / Voyage Multimodal)
|
|
const queryEmb = await embedMultimodal({ text: query });
|
|
const r = await vectorSearch(queryEmb);
|
|
// 결과 = 텍스트 또는 이미지 chunks
|
|
|
|
// LLM 에 이미지 + 텍스트 같이
|
|
const answer = await llm.chat({
|
|
messages: [{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: 'Based on the context...' },
|
|
...imageChunks.map(c => ({ type: 'image', source: c.url })),
|
|
...textChunks.map(c => ({ type: 'text', text: c.content })),
|
|
],
|
|
}],
|
|
});
|
|
```
|
|
|
|
### Self-RAG (model 가 retrieve 결정)
|
|
```
|
|
LLM 이 답하면서 "내가 더 정보 필요" 결정 → 검색 → 다시 답.
|
|
Function calling 으로 구현.
|
|
```
|
|
|
|
```ts
|
|
const tools = [{
|
|
name: 'search_kb',
|
|
description: 'Search internal knowledge base',
|
|
input_schema: { ... },
|
|
}];
|
|
|
|
// Loop — LLM 이 tool 사용 결정
|
|
```
|
|
|
|
### Citation + verification
|
|
```ts
|
|
// 답 + 출처 + 인용 검증
|
|
const answer = await llm.complete({
|
|
system: `Answer using context. After each claim, cite [chunk_N]. End with "VERIFIED" or "UNVERIFIED".`,
|
|
user: `Context:\n${contextWithIds}\n\nQ: ${query}`,
|
|
});
|
|
|
|
// Optional: 두 번째 LLM 가 답 ↔ context 검증
|
|
const ok = await verifyClaim(answer, retrievedChunks);
|
|
```
|
|
|
|
### Eval — Ragas
|
|
```python
|
|
from ragas import evaluate
|
|
from ragas.metrics import context_recall, context_precision, faithfulness, answer_relevancy
|
|
|
|
result = evaluate(
|
|
dataset,
|
|
metrics=[context_recall, context_precision, faithfulness, answer_relevancy],
|
|
)
|
|
```
|
|
|
|
### Pipeline 정리
|
|
```
|
|
1. Query → expand / HyDE / rewrite
|
|
2. Hybrid search (vector + BM25) → top-50
|
|
3. Rerank → top-5
|
|
4. Build context (with citations)
|
|
5. LLM answer (force citations)
|
|
6. Optional: verify
|
|
```
|
|
|
|
## 🤔 의사결정 기준
|
|
| 상황 | 기법 |
|
|
|---|---|
|
|
| 짧은 query | HyDE / expand |
|
|
| 대형 corpus + multi-hop | GraphRAG |
|
|
| 정확도 strict | Hybrid + rerank |
|
|
| 멀티 lang | Cohere multilingual |
|
|
| 멀티 modal (image/text) | CLIP / Voyage |
|
|
| Out-of-corpus 답 | Self-RAG (tool 결정) |
|
|
|
|
## ❌ 안티패턴
|
|
- **Vector only**: keyword 정확 match 약함.
|
|
- **Top-K 가 큼 (50) + LLM 에 모두**: noise. Rerank 후 5-10.
|
|
- **Citation 없음**: hallucination 검증 불가.
|
|
- **Static chunking 만**: contextual 가 더 강.
|
|
- **Eval 없음**: 어떤 변경이 향상 인지 모름.
|
|
- **Reranker 모든 query 사용**: latency. cache.
|
|
- **GraphRAG 작은 corpus**: overkill. simple RAG 충분.
|
|
|
|
## 🤖 LLM 활용 힌트
|
|
- Hybrid + reranker = sweet spot.
|
|
- Contextual chunks = 큰 향상 (49%).
|
|
- Citation 강제 system prompt.
|
|
- Ragas 로 eval.
|
|
|
|
## 🔗 관련 문서
|
|
- [[AI_RAG_Pattern_Basics]]
|
|
- [[AI_Embeddings_Comparison]]
|
|
- [[AI_LLM_Eval_Patterns]]
|
|
- [[DB_pgvector_Production]]
|