[G1-Sync] Manual knowledge update

2026-05-09 22:47:42 +09:00
parent 93ec7e9056
commit 21ac3ed255
56 changed files with 22043 additions and 43 deletions
@@ -0,0 +1,481 @@
+---
+id: db-vector-db-scaling
+title: Vector DB Scaling — Pinecone / Qdrant / Weaviate / Milvus
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [database, vector, scaling, vibe-coding]
+tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
+applied_in: []
+aliases: [Pinecone, Qdrant, Weaviate, Milvus, Vespa, vector index, HNSW, IVF]
+---
+
+# Vector DB Scaling
+
+> 1M 미만 = pgvector 충분. **1M-100M = Qdrant / Weaviate. 100M-10B = Pinecone / Milvus / Vespa**. Index type, sharding, replicas, hybrid 가 핵심.
+
+## 📖 핵심 개념
+- HNSW: 빠른 ANN.
+- IVF: 작은 메모리 / index.
+- Quantization: 8-bit / binary.
+- Filtering: metadata 기반.
+
+## 💻 코드 패턴
+
+### Pinecone (managed, 가장 인기)
+```ts
+import { Pinecone } from '@pinecone-database/pinecone';
+
+const pc = new Pinecone({ apiKey });
+
+const index = pc.index('my-index');
+
+// Upsert
+await index.upsert([
+  { id: 'doc1', values: embedding1, metadata: { lang: 'en', tag: 'intro' } },
+  { id: 'doc2', values: embedding2, metadata: { lang: 'ko', tag: 'main' } },
+]);
+
+// Query
+const r = await index.query({
+  vector: queryEmbedding,
+  topK: 10,
+  includeMetadata: true,
+  filter: { lang: 'en' },
+});
+```
+
+### Qdrant (open-source, 강)
+```ts
+import { QdrantClient } from '@qdrant/js-client-rest';
+
+const client = new QdrantClient({ url: 'http://qdrant:6333' });
+
+await client.createCollection('docs', {
+  vectors: { size: 1536, distance: 'Cosine' },
+  hnsw_config: { m: 16, ef_construct: 100 },
+});
+
+await client.upsert('docs', {
+  points: [
+    {
+      id: 'doc1',
+      vector: embedding1,
+      payload: { lang: 'en', tag: 'intro' },
+    },
+  ],
+});
+
+const r = await client.search('docs', {
+  vector: queryEmbedding,
+  limit: 10,
+  filter: {
+    must: [{ key: 'lang', match: { value: 'en' } }],
+  },
+});
+```
+
+→ Self-host 또는 cloud. 강력 filter.
+
+### Weaviate (semantic + hybrid)
+```ts
+import weaviate from 'weaviate-client';
+
+const client = await weaviate.connectToCustom({
+  httpHost: 'weaviate',
+  httpPort: 8080,
+});
+
+const collection = client.collections.get('Docs');
+
+await collection.data.insertMany([
+  { properties: { content: 'Hello', lang: 'en' }, vector: embedding1 },
+  { properties: { content: '안녕', lang: 'ko' }, vector: embedding2 },
+]);
+
+const r = await collection.query.nearVector(queryEmbedding, {
+  limit: 10,
+  filters: collection.filter.byProperty('lang').equal('en'),
+});
+```
+
+→ Built-in vectorizer (auto embed).
+
+### Milvus (큰 scale)
+```python
+from pymilvus import connections, Collection
+
+connections.connect(host='milvus', port='19530')
+
+collection = Collection('docs')
+collection.insert([
+    [id1, id2],
+    [embedding1, embedding2],
+    [{'lang': 'en'}, {'lang': 'ko'}],
+])
+
+results = collection.search(
+    data=[query_embedding],
+    anns_field='embedding',
+    param={'metric_type': 'COSINE', 'params': {'ef': 64}},
+    limit=10,
+    expr='lang == "en"',
+)
+```
+
+→ 10B+ scale. K8s native (Milvus Operator).
+
+### Vespa (큰 + hybrid)
+```yaml
+schema docs {
+    document docs {
+        field id type string {}
+        field content type string { indexing: index | summary }
+        field lang type string { indexing: attribute }
+        field embedding type tensor<float>(x[1536]) {
+            indexing: attribute | index
+            attribute {
+                distance-metric: angular
+            }
+            index {
+                hnsw {
+                    max-links-per-node: 16
+                    neighbors-to-explore-at-insert: 100
+                }
+            }
+        }
+    }
+    
+    rank-profile default {
+        first-phase {
+            expression: closeness(field, embedding)
+        }
+    }
+}
+```
+
+→ Yahoo / Spotify / 큰 search. Steep learning.
+
+### Index type comparison
+```
+HNSW (Hierarchical Navigable Small World):
+ 가장 빠른 search
+ 강력 recall
+- 큰 메모리
+- 새 build 시 큰 cost
+
+IVF (Inverted File):
+ 작은 메모리
+ 빠른 build
+- HNSW 보다 약간 느림
+
+Flat (brute force):
+ 100% recall
+- O(N) — 작은 dataset 만
+
+PQ / SQ (Product / Scalar Quantization):
+ 매우 작은 메모리 (4-32x)
+ 큰 dataset
+- Recall 약간 ↓
+```
+
+→ HNSW = default. PQ = 큰 scale.
+
+### Hybrid (vector + keyword)
+```ts
+// Weaviate
+const r = await collection.query.hybrid(query, {
+  vector: queryEmbedding,
+  alpha: 0.5,  // 0 = keyword, 1 = vector
+  limit: 10,
+});
+```
+
+```sql
+-- pgvector + tsvector
+WITH v_hits AS (
+  SELECT id, 1 - (embedding <=> $1) AS v_score
+  FROM docs ORDER BY embedding <=> $1 LIMIT 100
+),
+t_hits AS (
+  SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
+  FROM docs WHERE tsv @@ plainto_tsquery($2) LIMIT 100
+)
+SELECT id, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score
+FROM v_hits FULL OUTER JOIN t_hits USING (id)
+ORDER BY score DESC LIMIT 10;
+```
+
+→ Vector 만 가 부족 — keyword 같이.
+
+→ [[AI_RAG_Advanced]].
+
+### Quantization
+```ts
+// Pinecone — automatic
+// Qdrant
+await client.updateCollection('docs', {
+  quantization_config: {
+    scalar: { type: 'int8', always_ram: true },
+  },
+});
+
+// 4x 작은 메모리, 95%+ recall.
+```
+
+### Sharding (10B+)
+```yaml
+# Milvus / Weaviate / Vespa = 자동 sharding.
+# Cluster mode.
+
+# Pinecone = managed (자동).
+# Qdrant cluster = manual.
+```
+
+### Replication
+```
+Read replica:
+- Read scale
+- Failover
+
+Multi-region:
+- Edge user 가까이
+- Cost ↑
+```
+
+### Cost (대략)
+```
+Pinecone:
+- Starter: $0
+- Standard: $50/month + $0.40/M ops
+- 1M vectors × 1536 dim = $50/month (s1)
+
+Qdrant Cloud:
+- Free: 1GB
+- Paid: $0.05/GB/month
+- 1M × 1536 dim = ~6GB = $0.30/month + compute
+
+Weaviate Cloud: 비슷
+
+Self-host (Qdrant):
+- Server cost only
+- 1M × 1536 dim = 6GB RAM
+```
+
+→ Self-host = 가장 cheap. Managed = 운영 X.
+
+### Performance
+```
+HNSW search (1M docs):
+- Pinecone:    ~30ms p99
+- Qdrant:      ~10ms (self-host SSD + RAM)
+- Weaviate:    ~20ms
+- Milvus:      ~10ms
+- pgvector:     ~50ms (HNSW)
+
+→ Million scale = 비슷.
+   Billion scale = 큰 차이.
+```
+
+### Filter (metadata)
+```ts
+// Pinecone
+filter: { 
+  $and: [
+    { lang: 'en' },
+    { date: { $gte: '2026-01-01' } },
+  ],
+}
+
+// Qdrant
+filter: {
+  must: [
+    { key: 'lang', match: { value: 'en' } },
+    { key: 'date', range: { gte: '2026-01-01' } },
+  ],
+}
+```
+
+→ Pre-filter (index 안) vs post-filter (search 후) 의 strategies.
+
+### Multi-tenant
+```ts
+// Approach 1: Separate index per tenant
+// Pinecone: 비싸 (index 당 cost)
+// Qdrant: collection 별 OK
+
+// Approach 2: Shared index + tenant filter
+filter: { tenant_id: 'tenant-123' }
+
+// Approach 3: Namespace (Pinecone)
+await index.namespace('tenant-123').upsert([...]);
+await index.namespace('tenant-123').query({ vector, topK: 10 });
+```
+
+→ Namespace = isolation + scale.
+
+### Multi-vector (image + text)
+```ts
+// Same space
+await collection.upsert([
+  { id: 'item1', vector: clipEmbedding },
+]);
+
+// Or named vectors (Qdrant)
+await client.createCollection('items', {
+  vectors: {
+    image: { size: 512, distance: 'Cosine' },
+    text: { size: 1536, distance: 'Cosine' },
+  },
+});
+```
+
+→ Multi-modal search.
+
+### Batch insert (큰 import)
+```ts
+const BATCH = 1000;
+
+for (let i = 0; i < embeddings.length; i += BATCH) {
+  const batch = embeddings.slice(i, i + BATCH);
+  await index.upsert(batch);
+  console.log(`${i + batch.length}/${embeddings.length}`);
+}
+```
+
+→ Rate limit / memory 주의.
+
+### Re-embed (model 변경)
+```
+모델 변경 (text-embedding-3-small → 3-large):
+- Embedding 변경 — 모든 doc re-embed
+- 큰 cost / 시간
+
+해결:
+- 점진 (백그라운드)
+- 새 model = 새 namespace
+- 점진 traffic 이동
+```
+
+### Backup / restore
+```ts
+// Pinecone
+await index.createBackup({ name: 'snapshot-2026' });
+
+// Qdrant
+await client.createSnapshot('docs');
+
+// 큰 dataset = 시간 + storage.
+```
+
+### Search optimization
+```
+1. Reduce dim (Matryoshka): 1536 → 256 → 90% accuracy, 6x faster
+2. Binary quantization: 32x smaller, 70% accuracy
+3. Hybrid (vector + keyword): higher recall
+4. Reranker: top 50 → top 5 정밀
+5. Index parameter tune (ef_search, M)
+```
+
+### When pgvector vs dedicated
+```
+pgvector:
+ Postgres 의 query / transaction / join
+ Single DB
+ 작은 / 중간 (< 10M)
+- 큰 scale 약함
+
+Dedicated:
+ 큰 scale (100M+)
+ Specialized index
+- 별 system
+- 추가 sync
+```
+
+### Cloud comparisons
+```
+Pinecone:
+ Easiest
+ Best DX
+- 가장 비싸 (큰 scale)
+- Vendor lock
+
+Qdrant Cloud:
+ OSS + cloud
+ 강력 features
+ Cheap
+
+Weaviate Cloud:
+ Auto vectorize
+ Hybrid 강
+
+Vector DB on cloud (CF Vectorize, Vercel):
+ Edge 가까이
+- 작은 features
+
+Cohere / Voyage:
+ Embedding + search 통합
+- Vendor lock
+```
+
+### Edge vector search (CF Vectorize)
+```ts
+// wrangler.toml
+[[vectorize]]
+binding = "VECTORIZE"
+index_name = "my-index"
+```
+
+```ts
+// Worker
+await env.VECTORIZE.upsert([
+  { id: 'doc1', values: embedding, metadata: {} },
+]);
+
+const r = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
+```
+
+→ Edge near-user.
+
+### Monitoring
+```
+- Index size
+- Query latency (p50, p99)
+- QPS
+- Recall (sample test)
+- Cost per query
+```
+
+## 🤔 의사결정 기준
+| Scale | 추천 |
+|---|---|
+| < 1M | pgvector |
+| 1M-10M | Qdrant / Pinecone |
+| 10M-100M | Pinecone / Weaviate / Qdrant |
+| 100M-1B | Milvus / Vespa / Pinecone |
+| 1B+ | Vespa / Milvus + sharding |
+| Edge | CF Vectorize / Pinecone |
+| Hybrid (vector + text) | Vespa / Weaviate / pgvector + tsvector |
+
+## ❌ 안티패턴
+- **모든 거 Pinecone (작은 scale)**: pgvector 충분.
+- **Filter 가 강함 + post-filter**: 느림. Pre-filter index.
+- **Quantization 가정 + recall 검증 X**: accuracy 떨어짐.
+- **Re-embed 무 plan**: model 변경 = 재시작.
+- **Single-region + global users**: latency.
+- **Backup 없음**: data 잃음.
+- **Hybrid 무 + pure vector**: keyword case 못 잡음.
+
+## 🤖 LLM 활용 힌트
+- 시작 = pgvector.
+- Scale → Qdrant / Pinecone.
+- 큰 scale → Milvus / Vespa.
+- Hybrid + reranker = best quality.
+
+## 🔗 관련 문서
+- [[DB_pgvector_Production]]
+- [[AI_RAG_Pattern_Basics]]
+- [[AI_RAG_Advanced]]