Files
2nd/10_Wiki/Topics/Coding/AI_RAG_Pattern_Basics.md
T
2026-05-09 21:08:02 +09:00

5.3 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-rag-pattern-basics RAG — Retrieval Augmented Generation Coding draft B conceptual 2026-05-09 2026-05-09
ai
llm
rag
embedding
vector-db
vibe-coding
language applicable_to
TS / pgvector / OpenAI / Anthropic
Backend
RAG
embedding
vector search
chunking
hybrid search
BM25

RAG (Retrieval Augmented Generation)

  1. 문서 → chunk → embedding → vector DB.
  2. 쿼리 → embedding → top-K 검색 → context.
  3. LLM 에 context + question → 답. Hallucination 줄임 + 최신 데이터 + 출처 표시.

📖 핵심 개념

  • Embedding: 텍스트 → 벡터.
  • Vector DB: pgvector / Pinecone / Weaviate / Qdrant.
  • Chunking: 문서를 작게 나눔 (보통 500-1000 token).
  • Hybrid: vector + BM25 (keyword) 같이.
  • Reranker: top-50 → 작은 모델로 top-5 재선정.

💻 코드 패턴

1. Indexing (one-time)

import OpenAI from 'openai';

const openai = new OpenAI();

async function embed(text: string): Promise<number[]> {
  const r = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return r.data[0].embedding;
}

// chunk 하기
function chunkText(text: string, size = 1000, overlap = 100): string[] {
  // 단순한 sliding window — 실제로는 paragraph/sentence 경계
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += size - overlap) {
    chunks.push(text.slice(i, i + size));
  }
  return chunks;
}

for (const doc of docs) {
  const chunks = chunkText(doc.content);
  for (const [i, c] of chunks.entries()) {
    const emb = await embed(c);
    await db.docs.insert({
      docId: doc.id, chunkIdx: i, content: c, embedding: emb,
    });
  }
}

2. pgvector schema

CREATE EXTENSION vector;

CREATE TABLE docs (
  id BIGSERIAL PRIMARY KEY,
  doc_id TEXT,
  chunk_idx INT,
  content TEXT,
  embedding VECTOR(1536)
);

-- HNSW (Postgres 16+)
CREATE INDEX docs_emb_hnsw ON docs USING hnsw (embedding vector_cosine_ops);
-- 또는 ivfflat
CREATE INDEX docs_emb_ivf ON docs USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

3. Retrieval

async function retrieve(query: string, k = 5): Promise<Chunk[]> {
  const qEmb = await embed(query);
  const r = await db.queryRaw<Chunk[]>`
    SELECT id, doc_id, content, 1 - (embedding <=> ${qEmb}::vector) AS score
    FROM docs
    ORDER BY embedding <=> ${qEmb}::vector
    LIMIT ${k}
  `;
  return r;
}

4. 답변 생성

async function answer(query: string): Promise<{ text: string; citations: string[] }> {
  const chunks = await retrieve(query, 5);
  const context = chunks.map((c, i) => `[${i + 1}] ${c.content}`).join('\n\n');

  const r = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: `Answer using ONLY the context. Cite [n]. If unknown, say "I don't know".` },
      { role: 'user', content: `Context:\n${context}\n\nQ: ${query}` },
    ],
  });

  return {
    text: r.choices[0].message.content!,
    citations: chunks.map(c => c.docId),
  };
}

Hybrid (vector + keyword)

WITH vector_hits AS (
  SELECT id, content, 1 - (embedding <=> $1::vector) AS v_score
  FROM docs ORDER BY embedding <=> $1::vector LIMIT 50
),
text_hits AS (
  SELECT id, content, ts_rank(to_tsvector(content), plainto_tsquery($2)) AS t_score
  FROM docs WHERE to_tsvector(content) @@ plainto_tsquery($2) LIMIT 50
)
SELECT id, content, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score
FROM vector_hits FULL OUTER JOIN text_hits USING (id)
ORDER BY score DESC LIMIT 10;

Rerank (Cohere / Voyage)

const reranked = await cohere.rerank({
  model: 'rerank-3',
  query, documents: chunks.map(c => c.content), topN: 5,
});
const top = reranked.results.map(r => chunks[r.index]);

Smarter chunking (semantic)

// markdown header 기준 split, then size 제한
function smartChunk(md: string, maxTokens = 800): string[] {
  const sections = md.split(/^##\s+/m);
  // ... section 너무 길면 더 split
}

메타데이터 필터

SELECT * FROM docs
WHERE doc_id IN ('manual-2026', 'faq')  -- 사용자 권한 / 필터
  AND lang = 'ko'
ORDER BY embedding <=> $1::vector LIMIT 5;

🤔 의사결정 기준

규모 DB
<1M chunks pgvector
1M-100M Qdrant / Weaviate / Pinecone
1B+ Vespa / Milvus
Hybrid 필요 Weaviate / Vespa
단순 OpenAI Vector Store / Anthropic file API
ZeroOps Pinecone / Vectorize

안티패턴

  • Chunk 너무 큼 (5000 token): relevance 낮음.
  • Chunk 너무 작음 (50 token): context 부족.
  • Overlap 0: 경계 정보 잃음.
  • Vector 만 — keyword 무시: 정확한 단어 검색 약함. hybrid.
  • Rerank 없음 + top-K 큼: 노이즈 많음.
  • 출처 표시 안 함: hallucination 검증 불가.
  • Embedding 모델 mix: 같은 인덱스 내 한 모델만.
  • Metadata 없음: 권한 / lang / date 필터 못 함.

🤖 LLM 활용 힌트

  • pgvector + HNSW + hybrid + rerank 가 강력한 baseline.
  • Chunk = 500-1000 token, overlap 10%.
  • Citation 강제 (system prompt).

🔗 관련 문서