--- id: [[P-Reinforce|P-Reinforce]]-AUTO-VEC-001 category: AI_and_ML confidence_score: 1.00 tags: [auto-reinforced, vector-db, rag, vector-search, storage] last_reinforced: 2026-05-04 --- # [[Vector Database|Vector Database]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λΉ„μ •ν˜• λ°μ΄ν„°μ˜ κ±°λŒ€ν•œ μ’Œν‘œκ³„: ν…μŠ€νŠΈλ‚˜ 이미지λ₯Ό λ‹¨μˆœ μ €μž₯ν•˜λŠ” 것을 λ„˜μ–΄, κ³ μ°¨μ›μ˜ 숫자 λ°°μ—΄(Vector)둜 μΈλ±μ‹±ν•˜μ—¬ '의미적 μœ μ‚¬μ„±'을 기반으둜 μ΄ˆκ³ μ† 검색을 κ°€λŠ₯ν•˜κ²Œ ν•˜λŠ” AI μ‹œλŒ€μ˜ 핡심 μ €μž₯μ†Œ." ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) 벑터 λ°μ΄ν„°λ² μ΄μŠ€λŠ” 데이터λ₯Ό 고차원 벑터 κ³΅κ°„μ˜ 점으둜 ν‘œν˜„ν•˜κ³  μ €μž₯ν•˜λ©°, 이λ₯Ό 효율적으둜 κ²€μƒ‰ν•˜κΈ° μœ„ν•΄ μ„€κ³„λœ 특수 λͺ©μ μ˜ λ°μ΄ν„°λ² μ΄μŠ€ μ‹œμŠ€ν…œμž…λ‹ˆλ‹€. 1. **핡심 κΈ°λŠ₯ (Core Capabilities)**: * **벑터 μ €μž₯ 및 색인 (Storage & Indexing)**: 고차원 벑터 μž„λ² λ”©μ„ ν™•μž₯μ„± 있게 μ €μž₯ν•˜κ³ , [[Vector Search|Vector Search]]λ₯Ό μœ„ν•œ νŠΉν™”λœ 인덱슀(HNSW, IVF λ“±)λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€. * **μœ μ‚¬λ„ 검색 (Similarity Search)**: μ‚¬μš©μžμ˜ 질의 벑터와 κ°€μž₯ 'κ°€κΉŒμš΄' 데이터λ₯Ό μˆ˜ν•™μ  거리(코사인 μœ μ‚¬λ„ λ“±)λ₯Ό 기반으둜 μ°Ύμ•„λƒ…λ‹ˆλ‹€. * **속성 필터링 (Metadata Filtering)**: 벑터 검색과 ν•¨κ»˜ 전톡적인 메타데이터(λ‚ μ§œ, μΉ΄ν…Œκ³ λ¦¬ λ“±) 필터링을 κ²°ν•©ν•˜μ—¬ μ •κ΅ν•œ κ²°κ³Ό λ„μΆœμ΄ κ°€λŠ₯ν•©λ‹ˆλ‹€. 2. **μ£Όμš” 인덱싱 μ•Œκ³ λ¦¬μ¦˜ (ANN - Approximate Nearest Neighbor)**: * **[[HNSW (Hierarchical Navigable Small World)|HNSW]]**: λ‹€μΈ΅ κ·Έλž˜ν”„ ꡬ쑰둜 속도와 μ •ν™•λ„μ˜ 졜적의 κ· ν˜•μ„ μ œκ³΅ν•©λ‹ˆλ‹€. * **[[IVF (Inverted File Index)|IVF]]**: 곡간을 ν΄λŸ¬μŠ€ν„°λ‘œ λ‚˜λˆ„μ–΄ 검색 λ²”μœ„λ₯Ό μ’νžˆλŠ” λ°©μ‹μž…λ‹ˆλ‹€. * **[[PQ (Product Quantization)|PQ]]**: 벑터λ₯Ό μ••μΆ•ν•˜μ—¬ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ„ 획기적으둜 μ€„μž…λ‹ˆλ‹€. 3. **λŒ€ν‘œμ μΈ μ†”λ£¨μ…˜**: * **Open Source**: Milvus, Weaviate, Qdrant, Chroma, FAISS(Library) * **Managed/Cloud**: Pinecone, Zilliz ## βš–οΈ Trade-offs & Caveats * **μ»΄ν“¨νŒ… λ¦¬μ†ŒμŠ€**: μœ μ‚¬λ„ 계산 및 고차원 인덱슀 μœ μ§€λ₯Ό μœ„ν•΄ 높은 CPU/λ©”λͺ¨λ¦¬ 사양과 λ¦¬μ†ŒμŠ€ λΉ„μš©μ΄ λ°œμƒν•©λ‹ˆλ‹€. * **정확도 vs 속도**: μ„±λŠ₯을 μœ„ν•΄ [[ANN|ANN]] 기법을 μ‚¬μš©ν•˜λ©΄ 100% μ •ν™•ν•œ κ²°κ³Όκ°€ μ•„λ‹Œ 'κ·Όμ‚¬μΉ˜'λ₯Ό λ°˜ν™˜ν•˜λ―€λ‘œ, 정밀도가 κ·Ήλ„λ‘œ μ€‘μš”ν•œ λ„λ©”μΈμ—μ„œλŠ” 인덱슀 μ„€μ • νŠœλ‹μ΄ ν•„μš”ν•©λ‹ˆλ‹€. * **해석 κ°€λŠ₯μ„± λΆ€μ‘±**: μ‹œμŠ€ν…œμ΄ μ™œ νŠΉμ • κ²°κ³Όλ₯Ό μΆ”μ²œν–ˆλŠ”μ§€ μˆ˜ν•™μ  거리 외에 논리적인 이유λ₯Ό μ„€λͺ…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€. ## πŸ’» μ‹€μ „ κ΅¬ν˜„ μ½”λ“œ (Boilerplate) Python ν™˜κ²½μ—μ„œ `ChromaDB`λ₯Ό ν™œμš©ν•œ 벑터 λ°μ΄ν„°λ² μ΄μŠ€ ꡬ좕 μ˜ˆμ‹œμž…λ‹ˆλ‹€. ```python import chromadb from chromadb.utils import embedding_functions # 1. ν΄λΌμ΄μ–ΈνŠΈ 생성 및 μ»¬λ ‰μ…˜ μ΄ˆκΈ°ν™” client = chromadb.Client() sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2") collection = client.create_collection( name="antigravity_wiki", embedding_function=sentence_transformer_ef ) # 2. 데이터 μΆ”κ°€ (ν…μŠ€νŠΈ + 메타데이터) collection.add( documents=["RAGλŠ” 검색 증강 μƒμ„±μ˜ μ•½μžμž…λ‹ˆλ‹€.", "벑터 DBλŠ” 고차원 데이터λ₯Ό μ €μž₯ν•©λ‹ˆλ‹€."], metadatas=[{"category": "AI"}, {"category": "Infrastructure"}], ids=["id1", "id2"] ) # 3. μœ μ‚¬λ„ 검색 μ‹€ν–‰ results = collection.query( query_texts=["RAGκ°€ 뭐야?"], n_results=1 ) print(f"Top Result: {results['documents'][0][0]}") print(f"Confidence (Distance): {results['distances'][0][0]}") ``` ## πŸ”— 지식 μ—°κ²° (Graph) * **기반 기술**: [[Vector Embedding|Vector Embedding]], [[Vector Search|Vector Search]], [[Semantic Search|Semantic Search]] * **ν™œμš© μ•„ν‚€ν…μ²˜**: [[Retrieval-Augmented Generation (RAG)|RAG]], [[Recommendation System|μΆ”μ²œ μ‹œμŠ€ν…œ]] * **핡심 μ•Œκ³ λ¦¬μ¦˜**: [[ANN|ANN (Approximate Nearest Neighbor)]], [[HNSW|HNSW]], [[Cosine Similarity|Cosine Similarity]] --- *Last updated: 2026-05-04*