[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -0,0 +1,376 @@
+---
+id: ai-rag-production
+title: RAG Production — chunking / re-rank / eval
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [ai, rag, production, vibe-coding]
+tech_stack: { language: "TS / Python", applicable_to: ["AI"] }
+applied_in: []
+aliases: [RAG production, document chunking, parent document, hybrid search, rerank, RAG eval]
+---
+
+# RAG Production
+
+> Demo RAG = simple. **Production = chunking strategy + hybrid search + reranker + eval + monitoring**.
+
+## 📖 핵심 개념
+- Document → chunks → embed → vector store.
+- Query → retrieve → rerank → context.
+- Eval (recall, precision).
+- Continuous improvement (golden set).
+
+## 💻 코드 패턴
+
+### Chunking strategy
+```python
+# 1. Fixed size (단순)
+def chunk_fixed(text, size=500, overlap=50):
+    return [text[i:i+size] for i in range(0, len(text), size - overlap)]
+
+# 2. Sentence-based
+import re
+def chunk_sentences(text, max_sentences=5):
+    sentences = re.split(r'(?<=[.!?])\s+', text)
+    return [' '.join(sentences[i:i+max_sentences]) for i in range(0, len(sentences), max_sentences)]
+
+# 3. Semantic (LLM-driven)
+# 4. Markdown headers
+# 5. Recursive (LangChain RecursiveCharacterTextSplitter)
+```
+
+### Recursive chunking (best)
+```python
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+
+splitter = RecursiveCharacterTextSplitter(
+    chunk_size=500,
+    chunk_overlap=50,
+    separators=['\n\n', '\n', '. ', ' ', ''],
+)
+chunks = splitter.split_text(text)
+```
+
+→ Boundary 보존 (paragraph → sentence → word).
+
+### Parent document retriever
+```python
+# Small chunk = embed (precision).
+# Big chunk (parent) = context (recall).
+
+# Search small → return parent.
+```
+
+```python
+from langchain.retrievers import ParentDocumentRetriever
+retriever = ParentDocumentRetriever(
+    vectorstore=...,
+    docstore=...,
+    child_splitter=child,  # 200 char
+    parent_splitter=parent,  # 2000 char
+)
+```
+
+### Hybrid search
+```ts
+// BM25 + vector (RRF)
+const bm25Results = await bm25Search(query, 50);
+const vecResults = await vectorSearch(query, 50);
+const fused = rrf([bm25Results, vecResults]).slice(0, 20);
+```
+
+→ [[AI_Hybrid_Search_Patterns]].
+
+### Reranker
+```python
+from sentence_transformers import CrossEncoder
+reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
+
+candidates = hybrid_search(query, k=50)
+pairs = [(query, c.text) for c in candidates]
+scores = reranker.predict(pairs)
+top = sorted(zip(candidates, scores), key=lambda x: -x[1])[:5]
+```
+
+→ Top-50 → top-5. Quality ↑.
+
+### Cohere Rerank
+```ts
+const r = await cohere.rerank({
+  query, documents: candidates.map(c => c.text), topN: 5,
+  model: 'rerank-english-v3.0',
+});
+```
+
+→ Managed.
+
+### Query expansion
+```python
+# LLM 가 query 재작성 (3 variant)
+expanded = llm.complete(f'Generate 3 alternative phrasings of: "{query}"')
+queries = [query, *expanded.split('\n')]
+
+# 매 query 검색 + RRF
+results = [vector_search(q, 20) for q in queries]
+fused = rrf(results)
+```
+
+### HyDE (Hypothetical Document Embedding)
+```python
+# 가짜 답 생성 → embed → 검색
+hypothetical = llm.complete(f'Detailed answer for: {query}')
+emb = embed(hypothetical)
+results = vector_search(emb, 20)
+```
+
+→ Query 가 짧음 = 답 의 embed 가 더 가까움.
+
+### Multi-vector
+```python
+# Doc 의 매 section 가 own embed.
+# 1 section hit → doc 가 결과.
+```
+
+### Metadata filter
+```sql
+SELECT * FROM docs
+WHERE category = $1 AND date > $2
+ORDER BY embedding <=> $3
+LIMIT 20;
+```
+
+→ Pre-filter (efficient).
+
+### Citation
+```python
+# 매 chunk 의 source 보존.
+prompt = f'''
+Answer using ONLY:
+[1] {chunks[0].text} (source: {chunks[0].source})
+[2] {chunks[1].text}
+
+Question: {query}
+
+Cite [1], [2].
+'''
+```
+
+→ User trust ↑.
+
+### Prompt template
+```python
+SYSTEM = '''
+Answer using ONLY the context. If unsure, say "I don't know".
+Cite sources [1], [2].
+'''
+
+USER = f'''
+Context:
+{context}
+
+Question: {query}
+
+Answer:
+'''
+```
+
+### Eval (recall@K)
+```python
+def recall_at_k(predicted_ids, gold_ids, k=5):
+    return len(set(predicted_ids[:k]) & set(gold_ids)) / len(gold_ids)
+
+# Golden set (curated)
+gold = [{'query': 'X', 'relevant_docs': ['doc1', 'doc5']}]
+results = [retrieve(q['query']) for q in gold]
+recalls = [recall_at_k(r, q['relevant_docs']) for r, q in zip(results, gold)]
+print(f'Avg recall: {sum(recalls)/len(recalls):.2f}')
+```
+
+### LLM-judge eval
+```python
+# Promptfoo / RAGAS
+from ragas.metrics import faithfulness, answer_relevancy, context_precision
+
+eval_dataset = [...]
+result = evaluate(eval_dataset, [faithfulness, answer_relevancy, context_precision])
+```
+
+→ Faithfulness = answer 가 context 에서 나옴.
+
+### Monitoring (production)
+```python
+@trace
+def rag(query):
+    docs = retrieve(query)
+    answer = llm.complete(...)
+    log({'query': query, 'doc_count': len(docs), 'tokens': ..., 'latency': ...})
+    return answer
+```
+
+→ Helicone / LangSmith.
+
+### Cache
+```python
+# Same query = cached result.
+key = hashlib.sha256(query.encode()).hexdigest()
+cached = cache.get(key)
+if cached: return cached
+
+# 또는 prompt cache (Anthropic / OpenAI).
+```
+
+### Continuous improvement
+```
+1. Production query log.
+2. Bad answer = manual review.
+3. Add to golden set.
+4. Re-eval → improve.
+5. Re-deploy.
+```
+
+→ RAG quality 가 시간 따라 ↑.
+
+### Embedding model 선택
+```
+text-embedding-3-small (OpenAI): cheap, 좋은.
+text-embedding-3-large: 더 정확.
+voyage-3 / cohere embed-v3: SoTA.
+BGE / e5 (open): self-host.
+```
+
+→ MTEB leaderboard 참고.
+
+### Re-embedding (model 변경)
+```
+새 model 가 더 좋음 → 모든 doc 재 embed.
+- Cost 큰 (1M doc × $0.02 / M token).
+- Time (수 시간).
+```
+
+→ Plan 가 필요.
+
+### Vector DB 선택
+```
+pgvector: simple, Postgres 친화.
+Pinecone: managed, 빠름.
+Qdrant: open source, 빠름, hybrid built-in.
+Weaviate: 큰 features.
+Milvus: 큰 scale.
+ChromaDB: 작은 / dev.
+```
+
+→ [[DB_pgvector_Production]].
+
+### Chunk metadata
+```json
+{
+  "id": "chunk-1",
+  "text": "...",
+  "embedding": [...],
+  "source": "doc.pdf",
+  "page": 3,
+  "section": "Introduction",
+  "category": "engineering",
+  "created_at": "2026-05-01"
+}
+```
+
+→ Filter / citation 친화.
+
+### Production architecture
+```
+Doc upload → Parse → Chunk → Embed → Vector DB.
+Query → Embed → Hybrid search → Rerank → LLM → Answer + Citation.
+
+→ Chunking + ranking 가 가장 큰 quality lever.
+```
+
+### Multi-modal RAG
+```
+Doc 가 image / table 도.
+- Image embed (CLIP / Cohere multi-modal).
+- Table → markdown.
+- Combined search.
+```
+
+### Long context vs RAG
+```
+Long context (200k):
+- Simple, all in.
+- Cost / latency 큰.
+
+RAG:
+- Top-K only.
+- Cost / latency 작은.
+- Tuning 필요.
+
+→ < 50k = long context.
+> 50k = RAG.
+```
+
+### Cost / 1k query
+```
+Small RAG (10 chunks, GPT-4o-mini): $0.50.
+Large RAG (50 chunks + rerank, GPT-4o): $50.
+ Embedding storage: $.
+
+→ 매 query 가 multiple LLM call.
+```
+
+### Limitation
+```
+- Lost in the middle (긴 context).
+- Multi-hop reasoning (1 chunk 가 답 X).
+- Negation ('이 가 아닌 것').
+- Recent data (cutoff).
+```
+
+→ Agentic RAG / iterative 가 답.
+
+### Iterative RAG
+```python
+def iterative_rag(query, max_steps=3):
+    context = ''
+    for step in range(max_steps):
+        new_query = llm.complete(f'Q: {query}\nKnown: {context}\nWhat else needed?')
+        docs = retrieve(new_query)
+        context += format(docs)
+        if llm.complete(f'Sufficient? Y/N {context}') == 'Y':
+            break
+    return llm.complete(f'Q: {query}\n{context}')
+```
+
+→ Multi-hop 의 답.
+
+## 🤔 의사결정 기준
+| 작업 | 추천 |
+|---|---|
+| Document Q&A | RAG |
+| Code search | Hybrid + AST chunk |
+| Multi-hop | Agentic RAG |
+| Real-time | Cached prompts |
+| Production | Hybrid + rerank + eval |
+| 작은 / quick | LangChain default |
+
+## ❌ 안티패턴
+- **Vector 만**: keyword 약함.
+- **Fixed chunk**: boundary 깨짐.
+- **No rerank**: noise.
+- **No citation**: 신뢰 X.
+- **No eval**: silent regression.
+- **Huge chunk**: noise.
+- **Tiny chunk**: context 잃음.
+
+## 🤖 LLM 활용 힌트
+- Recursive chunking + hybrid + rerank 가 baseline.
+- Citation + eval 가 production.
+- Iterative RAG 가 multi-hop.
+- Continuous golden set update.
+
+## 🔗 관련 문서
+- [[AI_RAG_Pattern_Basics]]
+- [[AI_RAG_Advanced]]
+- [[AI_Hybrid_Search_Patterns]]