Files
2nd/10_Wiki/Topics/Coding/AI_RAG_Advanced.md
T
2026-05-09 21:08:02 +09:00

7.6 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-rag-advanced RAG Advanced — Hybrid / Rerank / Multi-modal / Graph Coding draft B conceptual 2026-05-09 2026-05-09
ai
rag
advanced
vibe-coding
language applicable_to
TS / Python
Backend
hybrid search
reranking
GraphRAG
multi-vector
contextual retrieval
query expansion

RAG Advanced

단순 vector 만 = 한계. Hybrid (vector + BM25), reranker, query rewrite, contextual chunking, GraphRAG. Anthropic Contextual Retrieval = 49% 정확도 향상.

📖 핵심 개념

  • Hybrid: vector + keyword 가중치.
  • Reranker: top-K 다시 정렬 (cross-encoder).
  • Query expansion: 짧은 query 풍부화.
  • Contextual chunking: chunk 에 문서 context 첨부.
  • GraphRAG: entity / relationship 그래프.

💻 코드 패턴

async function hybridSearch(query: string, k = 20): Promise<Chunk[]> {
  const [vectorHits, keywordHits] = await Promise.all([
    vectorSearch(query, k * 2),
    bm25Search(query, k * 2),
  ]);

  // RRF (Reciprocal Rank Fusion)
  const scores = new Map<string, number>();
  vectorHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i)));
  keywordHits.forEach((c, i) => scores.set(c.id, (scores.get(c.id) ?? 0) + 1 / (60 + i)));

  const merged = [...new Set([...vectorHits, ...keywordHits].map(c => c.id))];
  return merged
    .map(id => ({ ...findChunk(id), score: scores.get(id)! }))
    .sort((a, b) => b.score - a.score)
    .slice(0, k);
}

Reranker (Cohere / Voyage / Jina)

import { CohereClient } from 'cohere-ai';
const cohere = new CohereClient({ token });

async function rerank(query: string, chunks: Chunk[], topK = 5): Promise<Chunk[]> {
  const r = await cohere.rerank({
    model: 'rerank-multilingual-v3.0',
    query,
    documents: chunks.map(c => c.content),
    topN: topK,
  });
  return r.results.map(res => chunks[res.index]);
}

// Pipeline
const candidates = await hybridSearch(query, 50);  // 50 후보
const top = await rerank(query, candidates, 5);    // 5 정밀

Contextual Retrieval (Anthropic)

// 각 chunk 에 문서 context 추가
async function buildContextualChunks(doc: Document) {
  const chunks = chunkText(doc.content, 1000);
  return await Promise.all(chunks.map(async (chunk, i) => {
    const context = await llm.complete({
      system: 'Provide 2-3 sentences of context for this chunk in the document.',
      user: `Document: ${doc.title}\n\nChunk: ${chunk}`,
    });
    return {
      content: context + '\n\n' + chunk,
      embedding: await embed(context + '\n\n' + chunk),
      docId: doc.id,
      chunkIdx: i,
    };
  }));
}

→ 답이 chunk 안 있는데도 retrieval 정확도 ↑.

Query rewrite / expansion

async function expandQuery(query: string): Promise<string[]> {
  const r = await llm.complete({
    system: 'Generate 3 alternative phrasings of the query for search. Output JSON array.',
    user: query,
    response_format: { type: 'json_object' },
  });
  return [query, ...JSON.parse(r).queries];
}

// 각 query 검색 → 결과 합치기 (RRF)
const queries = await expandQuery(userQuery);
const allHits = await Promise.all(queries.map(q => hybridSearch(q)));
const merged = mergeRRF(allHits);

HyDE (Hypothetical Document Embeddings)

// 가상 답을 먼저 생성 → 그 답의 embedding 으로 검색
async function hyde(query: string): Promise<Chunk[]> {
  const hypoAnswer = await llm.complete({
    system: 'Write a concise hypothetical paragraph that would answer the question.',
    user: query,
  });
  const queryEmb = await embed(hypoAnswer);
  return await vectorSearch(queryEmb, 10);
}

→ Query 가 짧을 때 효과.

Multi-vector (different aspects)

// Document 한 개 = 여러 embedding (제목, 요약, 본문)
const chunks = [
  { type: 'title', content: doc.title, embedding: await embed(doc.title) },
  { type: 'summary', content: doc.summary, embedding: await embed(doc.summary) },
  ...sectionChunks,
];

// Query 에 적합한 type 가중치

Late-interaction (ColBERT)

Query 와 document 를 token 단위 embedding.
각 query token 의 max score with document tokens.
→ 정확하지만 비싸.

→ Vespa / 자체 ColBERT.

GraphRAG (Microsoft)

1. 문서 → entity / relationship 추출 (LLM)
2. 그래프 빌드
3. Community detection (entities cluster)
4. 각 community 의 summary 미리 생성
5. Query → community summary + 관련 entity 직접
async function extractEntities(chunk: string) {
  const r = await llm.complete({
    system: 'Extract entities and relationships as JSON.',
    user: chunk,
    response_format: { type: 'json_object' },
  });
  return JSON.parse(r); // { entities: [...], relationships: [...] }
}

→ 큰 문서 모음 + multi-hop reasoning 에 강.

Multi-modal RAG

// 이미지 + 텍스트 → 같은 vector space (CLIP / Voyage Multimodal)
const queryEmb = await embedMultimodal({ text: query });
const r = await vectorSearch(queryEmb);
// 결과 = 텍스트 또는 이미지 chunks

// LLM 에 이미지 + 텍스트 같이
const answer = await llm.chat({
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Based on the context...' },
      ...imageChunks.map(c => ({ type: 'image', source: c.url })),
      ...textChunks.map(c => ({ type: 'text', text: c.content })),
    ],
  }],
});

Self-RAG (model 가 retrieve 결정)

LLM 이 답하면서 "내가 더 정보 필요" 결정 → 검색 → 다시 답.
Function calling 으로 구현.
const tools = [{
  name: 'search_kb',
  description: 'Search internal knowledge base',
  input_schema: { ... },
}];

// Loop — LLM 이 tool 사용 결정

Citation + verification

// 답 + 출처 + 인용 검증
const answer = await llm.complete({
  system: `Answer using context. After each claim, cite [chunk_N]. End with "VERIFIED" or "UNVERIFIED".`,
  user: `Context:\n${contextWithIds}\n\nQ: ${query}`,
});

// Optional: 두 번째 LLM 가 답 ↔ context 검증
const ok = await verifyClaim(answer, retrievedChunks);

Eval — Ragas

from ragas import evaluate
from ragas.metrics import context_recall, context_precision, faithfulness, answer_relevancy

result = evaluate(
    dataset,
    metrics=[context_recall, context_precision, faithfulness, answer_relevancy],
)

Pipeline 정리

1. Query → expand / HyDE / rewrite
2. Hybrid search (vector + BM25) → top-50
3. Rerank → top-5
4. Build context (with citations)
5. LLM answer (force citations)
6. Optional: verify

🤔 의사결정 기준

상황 기법
짧은 query HyDE / expand
대형 corpus + multi-hop GraphRAG
정확도 strict Hybrid + rerank
멀티 lang Cohere multilingual
멀티 modal (image/text) CLIP / Voyage
Out-of-corpus 답 Self-RAG (tool 결정)

안티패턴

  • Vector only: keyword 정확 match 약함.
  • Top-K 가 큼 (50) + LLM 에 모두: noise. Rerank 후 5-10.
  • Citation 없음: hallucination 검증 불가.
  • Static chunking 만: contextual 가 더 강.
  • Eval 없음: 어떤 변경이 향상 인지 모름.
  • Reranker 모든 query 사용: latency. cache.
  • GraphRAG 작은 corpus: overkill. simple RAG 충분.

🤖 LLM 활용 힌트

  • Hybrid + reranker = sweet spot.
  • Contextual chunks = 큰 향상 (49%).
  • Citation 강제 system prompt.
  • Ragas 로 eval.

🔗 관련 문서