--- id: db-vector-db-scaling title: Vector DB Scaling — Pinecone / Qdrant / Weaviate / Milvus category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [database, vector, scaling, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend"] } applied_in: [] aliases: [Pinecone, Qdrant, Weaviate, Milvus, Vespa, vector index, HNSW, IVF] --- # Vector DB Scaling > 1M 미만 = pgvector 충분. **1M-100M = Qdrant / Weaviate. 100M-10B = Pinecone / Milvus / Vespa**. Index type, sharding, replicas, hybrid 가 핵심. ## 📖 핵심 개념 - HNSW: 빠른 ANN. - IVF: 작은 메모리 / index. - Quantization: 8-bit / binary. - Filtering: metadata 기반. ## 💻 코드 패턴 ### Pinecone (managed, 가장 인기) ```ts import { Pinecone } from '@pinecone-database/pinecone'; const pc = new Pinecone({ apiKey }); const index = pc.index('my-index'); // Upsert await index.upsert([ { id: 'doc1', values: embedding1, metadata: { lang: 'en', tag: 'intro' } }, { id: 'doc2', values: embedding2, metadata: { lang: 'ko', tag: 'main' } }, ]); // Query const r = await index.query({ vector: queryEmbedding, topK: 10, includeMetadata: true, filter: { lang: 'en' }, }); ``` ### Qdrant (open-source, 강) ```ts import { QdrantClient } from '@qdrant/js-client-rest'; const client = new QdrantClient({ url: 'http://qdrant:6333' }); await client.createCollection('docs', { vectors: { size: 1536, distance: 'Cosine' }, hnsw_config: { m: 16, ef_construct: 100 }, }); await client.upsert('docs', { points: [ { id: 'doc1', vector: embedding1, payload: { lang: 'en', tag: 'intro' }, }, ], }); const r = await client.search('docs', { vector: queryEmbedding, limit: 10, filter: { must: [{ key: 'lang', match: { value: 'en' } }], }, }); ``` → Self-host 또는 cloud. 강력 filter. ### Weaviate (semantic + hybrid) ```ts import weaviate from 'weaviate-client'; const client = await weaviate.connectToCustom({ httpHost: 'weaviate', httpPort: 8080, }); const collection = client.collections.get('Docs'); await collection.data.insertMany([ { properties: { content: 'Hello', lang: 'en' }, vector: embedding1 }, { properties: { content: '안녕', lang: 'ko' }, vector: embedding2 }, ]); const r = await collection.query.nearVector(queryEmbedding, { limit: 10, filters: collection.filter.byProperty('lang').equal('en'), }); ``` → Built-in vectorizer (auto embed). ### Milvus (큰 scale) ```python from pymilvus import connections, Collection connections.connect(host='milvus', port='19530') collection = Collection('docs') collection.insert([ [id1, id2], [embedding1, embedding2], [{'lang': 'en'}, {'lang': 'ko'}], ]) results = collection.search( data=[query_embedding], anns_field='embedding', param={'metric_type': 'COSINE', 'params': {'ef': 64}}, limit=10, expr='lang == "en"', ) ``` → 10B+ scale. K8s native (Milvus Operator). ### Vespa (큰 + hybrid) ```yaml schema docs { document docs { field id type string {} field content type string { indexing: index | summary } field lang type string { indexing: attribute } field embedding type tensor(x[1536]) { indexing: attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } } } } rank-profile default { first-phase { expression: closeness(field, embedding) } } } ``` → Yahoo / Spotify / 큰 search. Steep learning. ### Index type comparison ``` HNSW (Hierarchical Navigable Small World): + 가장 빠른 search + 강력 recall - 큰 메모리 - 새 build 시 큰 cost IVF (Inverted File): + 작은 메모리 + 빠른 build - HNSW 보다 약간 느림 Flat (brute force): + 100% recall - O(N) — 작은 dataset 만 PQ / SQ (Product / Scalar Quantization): + 매우 작은 메모리 (4-32x) + 큰 dataset - Recall 약간 ↓ ``` → HNSW = default. PQ = 큰 scale. ### Hybrid (vector + keyword) ```ts // Weaviate const r = await collection.query.hybrid(query, { vector: queryEmbedding, alpha: 0.5, // 0 = keyword, 1 = vector limit: 10, }); ``` ```sql -- pgvector + tsvector WITH v_hits AS ( SELECT id, 1 - (embedding <=> $1) AS v_score FROM docs ORDER BY embedding <=> $1 LIMIT 100 ), t_hits AS ( SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score FROM docs WHERE tsv @@ plainto_tsquery($2) LIMIT 100 ) SELECT id, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score FROM v_hits FULL OUTER JOIN t_hits USING (id) ORDER BY score DESC LIMIT 10; ``` → Vector 만 가 부족 — keyword 같이. → [[AI_RAG_Advanced]]. ### Quantization ```ts // Pinecone — automatic // Qdrant await client.updateCollection('docs', { quantization_config: { scalar: { type: 'int8', always_ram: true }, }, }); // 4x 작은 메모리, 95%+ recall. ``` ### Sharding (10B+) ```yaml # Milvus / Weaviate / Vespa = 자동 sharding. # Cluster mode. # Pinecone = managed (자동). # Qdrant cluster = manual. ``` ### Replication ``` Read replica: - Read scale - Failover Multi-region: - Edge user 가까이 - Cost ↑ ``` ### Cost (대략) ``` Pinecone: - Starter: $0 - Standard: $50/month + $0.40/M ops - 1M vectors × 1536 dim = $50/month (s1) Qdrant Cloud: - Free: 1GB - Paid: $0.05/GB/month - 1M × 1536 dim = ~6GB = $0.30/month + compute Weaviate Cloud: 비슷 Self-host (Qdrant): - Server cost only - 1M × 1536 dim = 6GB RAM ``` → Self-host = 가장 cheap. Managed = 운영 X. ### Performance ``` HNSW search (1M docs): - Pinecone: ~30ms p99 - Qdrant: ~10ms (self-host SSD + RAM) - Weaviate: ~20ms - Milvus: ~10ms - pgvector: ~50ms (HNSW) → Million scale = 비슷. Billion scale = 큰 차이. ``` ### Filter (metadata) ```ts // Pinecone filter: { $and: [ { lang: 'en' }, { date: { $gte: '2026-01-01' } }, ], } // Qdrant filter: { must: [ { key: 'lang', match: { value: 'en' } }, { key: 'date', range: { gte: '2026-01-01' } }, ], } ``` → Pre-filter (index 안) vs post-filter (search 후) 의 strategies. ### Multi-tenant ```ts // Approach 1: Separate index per tenant // Pinecone: 비싸 (index 당 cost) // Qdrant: collection 별 OK // Approach 2: Shared index + tenant filter filter: { tenant_id: 'tenant-123' } // Approach 3: Namespace (Pinecone) await index.namespace('tenant-123').upsert([...]); await index.namespace('tenant-123').query({ vector, topK: 10 }); ``` → Namespace = isolation + scale. ### Multi-vector (image + text) ```ts // Same space await collection.upsert([ { id: 'item1', vector: clipEmbedding }, ]); // Or named vectors (Qdrant) await client.createCollection('items', { vectors: { image: { size: 512, distance: 'Cosine' }, text: { size: 1536, distance: 'Cosine' }, }, }); ``` → Multi-modal search. ### Batch insert (큰 import) ```ts const BATCH = 1000; for (let i = 0; i < embeddings.length; i += BATCH) { const batch = embeddings.slice(i, i + BATCH); await index.upsert(batch); console.log(`${i + batch.length}/${embeddings.length}`); } ``` → Rate limit / memory 주의. ### Re-embed (model 변경) ``` 모델 변경 (text-embedding-3-small → 3-large): - Embedding 변경 — 모든 doc re-embed - 큰 cost / 시간 해결: - 점진 (백그라운드) - 새 model = 새 namespace - 점진 traffic 이동 ``` ### Backup / restore ```ts // Pinecone await index.createBackup({ name: 'snapshot-2026' }); // Qdrant await client.createSnapshot('docs'); // 큰 dataset = 시간 + storage. ``` ### Search optimization ``` 1. Reduce dim (Matryoshka): 1536 → 256 → 90% accuracy, 6x faster 2. Binary quantization: 32x smaller, 70% accuracy 3. Hybrid (vector + keyword): higher recall 4. Reranker: top 50 → top 5 정밀 5. Index parameter tune (ef_search, M) ``` ### When pgvector vs dedicated ``` pgvector: + Postgres 의 query / transaction / join + Single DB + 작은 / 중간 (< 10M) - 큰 scale 약함 Dedicated: + 큰 scale (100M+) + Specialized index - 별 system - 추가 sync ``` ### Cloud comparisons ``` Pinecone: + Easiest + Best DX - 가장 비싸 (큰 scale) - Vendor lock Qdrant Cloud: + OSS + cloud + 강력 features + Cheap Weaviate Cloud: + Auto vectorize + Hybrid 강 Vector DB on cloud (CF Vectorize, Vercel): + Edge 가까이 - 작은 features Cohere / Voyage: + Embedding + search 통합 - Vendor lock ``` ### Edge vector search (CF Vectorize) ```ts // wrangler.toml [[vectorize]] binding = "VECTORIZE" index_name = "my-index" ``` ```ts // Worker await env.VECTORIZE.upsert([ { id: 'doc1', values: embedding, metadata: {} }, ]); const r = await env.VECTORIZE.query(queryEmbedding, { topK: 10 }); ``` → Edge near-user. ### Monitoring ``` - Index size - Query latency (p50, p99) - QPS - Recall (sample test) - Cost per query ``` ## 🤔 의사결정 기준 | Scale | 추천 | |---|---| | < 1M | pgvector | | 1M-10M | Qdrant / Pinecone | | 10M-100M | Pinecone / Weaviate / Qdrant | | 100M-1B | Milvus / Vespa / Pinecone | | 1B+ | Vespa / Milvus + sharding | | Edge | CF Vectorize / Pinecone | | Hybrid (vector + text) | Vespa / Weaviate / pgvector + tsvector | ## ❌ 안티패턴 - **모든 거 Pinecone (작은 scale)**: pgvector 충분. - **Filter 가 강함 + post-filter**: 느림. Pre-filter index. - **Quantization 가정 + recall 검증 X**: accuracy 떨어짐. - **Re-embed 무 plan**: model 변경 = 재시작. - **Single-region + global users**: latency. - **Backup 없음**: data 잃음. - **Hybrid 무 + pure vector**: keyword case 못 잡음. ## 🤖 LLM 활용 힌트 - 시작 = pgvector. - Scale → Qdrant / Pinecone. - 큰 scale → Milvus / Vespa. - Hybrid + reranker = best quality. ## 🔗 관련 문서 - [[DB_pgvector_Production]] - [[AI_RAG_Pattern_Basics]] - [[AI_RAG_Advanced]]