[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,481 @@
|
||||
---
|
||||
id: db-vector-db-scaling
|
||||
title: Vector DB Scaling — Pinecone / Qdrant / Weaviate / Milvus
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [database, vector, scaling, vibe-coding]
|
||||
tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
|
||||
applied_in: []
|
||||
aliases: [Pinecone, Qdrant, Weaviate, Milvus, Vespa, vector index, HNSW, IVF]
|
||||
---
|
||||
|
||||
# Vector DB Scaling
|
||||
|
||||
> 1M 미만 = pgvector 충분. **1M-100M = Qdrant / Weaviate. 100M-10B = Pinecone / Milvus / Vespa**. Index type, sharding, replicas, hybrid 가 핵심.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- HNSW: 빠른 ANN.
|
||||
- IVF: 작은 메모리 / index.
|
||||
- Quantization: 8-bit / binary.
|
||||
- Filtering: metadata 기반.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### Pinecone (managed, 가장 인기)
|
||||
```ts
|
||||
import { Pinecone } from '@pinecone-database/pinecone';
|
||||
|
||||
const pc = new Pinecone({ apiKey });
|
||||
|
||||
const index = pc.index('my-index');
|
||||
|
||||
// Upsert
|
||||
await index.upsert([
|
||||
{ id: 'doc1', values: embedding1, metadata: { lang: 'en', tag: 'intro' } },
|
||||
{ id: 'doc2', values: embedding2, metadata: { lang: 'ko', tag: 'main' } },
|
||||
]);
|
||||
|
||||
// Query
|
||||
const r = await index.query({
|
||||
vector: queryEmbedding,
|
||||
topK: 10,
|
||||
includeMetadata: true,
|
||||
filter: { lang: 'en' },
|
||||
});
|
||||
```
|
||||
|
||||
### Qdrant (open-source, 강)
|
||||
```ts
|
||||
import { QdrantClient } from '@qdrant/js-client-rest';
|
||||
|
||||
const client = new QdrantClient({ url: 'http://qdrant:6333' });
|
||||
|
||||
await client.createCollection('docs', {
|
||||
vectors: { size: 1536, distance: 'Cosine' },
|
||||
hnsw_config: { m: 16, ef_construct: 100 },
|
||||
});
|
||||
|
||||
await client.upsert('docs', {
|
||||
points: [
|
||||
{
|
||||
id: 'doc1',
|
||||
vector: embedding1,
|
||||
payload: { lang: 'en', tag: 'intro' },
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const r = await client.search('docs', {
|
||||
vector: queryEmbedding,
|
||||
limit: 10,
|
||||
filter: {
|
||||
must: [{ key: 'lang', match: { value: 'en' } }],
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
→ Self-host 또는 cloud. 강력 filter.
|
||||
|
||||
### Weaviate (semantic + hybrid)
|
||||
```ts
|
||||
import weaviate from 'weaviate-client';
|
||||
|
||||
const client = await weaviate.connectToCustom({
|
||||
httpHost: 'weaviate',
|
||||
httpPort: 8080,
|
||||
});
|
||||
|
||||
const collection = client.collections.get('Docs');
|
||||
|
||||
await collection.data.insertMany([
|
||||
{ properties: { content: 'Hello', lang: 'en' }, vector: embedding1 },
|
||||
{ properties: { content: '안녕', lang: 'ko' }, vector: embedding2 },
|
||||
]);
|
||||
|
||||
const r = await collection.query.nearVector(queryEmbedding, {
|
||||
limit: 10,
|
||||
filters: collection.filter.byProperty('lang').equal('en'),
|
||||
});
|
||||
```
|
||||
|
||||
→ Built-in vectorizer (auto embed).
|
||||
|
||||
### Milvus (큰 scale)
|
||||
```python
|
||||
from pymilvus import connections, Collection
|
||||
|
||||
connections.connect(host='milvus', port='19530')
|
||||
|
||||
collection = Collection('docs')
|
||||
collection.insert([
|
||||
[id1, id2],
|
||||
[embedding1, embedding2],
|
||||
[{'lang': 'en'}, {'lang': 'ko'}],
|
||||
])
|
||||
|
||||
results = collection.search(
|
||||
data=[query_embedding],
|
||||
anns_field='embedding',
|
||||
param={'metric_type': 'COSINE', 'params': {'ef': 64}},
|
||||
limit=10,
|
||||
expr='lang == "en"',
|
||||
)
|
||||
```
|
||||
|
||||
→ 10B+ scale. K8s native (Milvus Operator).
|
||||
|
||||
### Vespa (큰 + hybrid)
|
||||
```yaml
|
||||
schema docs {
|
||||
document docs {
|
||||
field id type string {}
|
||||
field content type string { indexing: index | summary }
|
||||
field lang type string { indexing: attribute }
|
||||
field embedding type tensor<float>(x[1536]) {
|
||||
indexing: attribute | index
|
||||
attribute {
|
||||
distance-metric: angular
|
||||
}
|
||||
index {
|
||||
hnsw {
|
||||
max-links-per-node: 16
|
||||
neighbors-to-explore-at-insert: 100
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
rank-profile default {
|
||||
first-phase {
|
||||
expression: closeness(field, embedding)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
→ Yahoo / Spotify / 큰 search. Steep learning.
|
||||
|
||||
### Index type comparison
|
||||
```
|
||||
HNSW (Hierarchical Navigable Small World):
|
||||
+ 가장 빠른 search
|
||||
+ 강력 recall
|
||||
- 큰 메모리
|
||||
- 새 build 시 큰 cost
|
||||
|
||||
IVF (Inverted File):
|
||||
+ 작은 메모리
|
||||
+ 빠른 build
|
||||
- HNSW 보다 약간 느림
|
||||
|
||||
Flat (brute force):
|
||||
+ 100% recall
|
||||
- O(N) — 작은 dataset 만
|
||||
|
||||
PQ / SQ (Product / Scalar Quantization):
|
||||
+ 매우 작은 메모리 (4-32x)
|
||||
+ 큰 dataset
|
||||
- Recall 약간 ↓
|
||||
```
|
||||
|
||||
→ HNSW = default. PQ = 큰 scale.
|
||||
|
||||
### Hybrid (vector + keyword)
|
||||
```ts
|
||||
// Weaviate
|
||||
const r = await collection.query.hybrid(query, {
|
||||
vector: queryEmbedding,
|
||||
alpha: 0.5, // 0 = keyword, 1 = vector
|
||||
limit: 10,
|
||||
});
|
||||
```
|
||||
|
||||
```sql
|
||||
-- pgvector + tsvector
|
||||
WITH v_hits AS (
|
||||
SELECT id, 1 - (embedding <=> $1) AS v_score
|
||||
FROM docs ORDER BY embedding <=> $1 LIMIT 100
|
||||
),
|
||||
t_hits AS (
|
||||
SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
|
||||
FROM docs WHERE tsv @@ plainto_tsquery($2) LIMIT 100
|
||||
)
|
||||
SELECT id, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score
|
||||
FROM v_hits FULL OUTER JOIN t_hits USING (id)
|
||||
ORDER BY score DESC LIMIT 10;
|
||||
```
|
||||
|
||||
→ Vector 만 가 부족 — keyword 같이.
|
||||
|
||||
→ [[AI_RAG_Advanced]].
|
||||
|
||||
### Quantization
|
||||
```ts
|
||||
// Pinecone — automatic
|
||||
// Qdrant
|
||||
await client.updateCollection('docs', {
|
||||
quantization_config: {
|
||||
scalar: { type: 'int8', always_ram: true },
|
||||
},
|
||||
});
|
||||
|
||||
// 4x 작은 메모리, 95%+ recall.
|
||||
```
|
||||
|
||||
### Sharding (10B+)
|
||||
```yaml
|
||||
# Milvus / Weaviate / Vespa = 자동 sharding.
|
||||
# Cluster mode.
|
||||
|
||||
# Pinecone = managed (자동).
|
||||
# Qdrant cluster = manual.
|
||||
```
|
||||
|
||||
### Replication
|
||||
```
|
||||
Read replica:
|
||||
- Read scale
|
||||
- Failover
|
||||
|
||||
Multi-region:
|
||||
- Edge user 가까이
|
||||
- Cost ↑
|
||||
```
|
||||
|
||||
### Cost (대략)
|
||||
```
|
||||
Pinecone:
|
||||
- Starter: $0
|
||||
- Standard: $50/month + $0.40/M ops
|
||||
- 1M vectors × 1536 dim = $50/month (s1)
|
||||
|
||||
Qdrant Cloud:
|
||||
- Free: 1GB
|
||||
- Paid: $0.05/GB/month
|
||||
- 1M × 1536 dim = ~6GB = $0.30/month + compute
|
||||
|
||||
Weaviate Cloud: 비슷
|
||||
|
||||
Self-host (Qdrant):
|
||||
- Server cost only
|
||||
- 1M × 1536 dim = 6GB RAM
|
||||
```
|
||||
|
||||
→ Self-host = 가장 cheap. Managed = 운영 X.
|
||||
|
||||
### Performance
|
||||
```
|
||||
HNSW search (1M docs):
|
||||
- Pinecone: ~30ms p99
|
||||
- Qdrant: ~10ms (self-host SSD + RAM)
|
||||
- Weaviate: ~20ms
|
||||
- Milvus: ~10ms
|
||||
- pgvector: ~50ms (HNSW)
|
||||
|
||||
→ Million scale = 비슷.
|
||||
Billion scale = 큰 차이.
|
||||
```
|
||||
|
||||
### Filter (metadata)
|
||||
```ts
|
||||
// Pinecone
|
||||
filter: {
|
||||
$and: [
|
||||
{ lang: 'en' },
|
||||
{ date: { $gte: '2026-01-01' } },
|
||||
],
|
||||
}
|
||||
|
||||
// Qdrant
|
||||
filter: {
|
||||
must: [
|
||||
{ key: 'lang', match: { value: 'en' } },
|
||||
{ key: 'date', range: { gte: '2026-01-01' } },
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
→ Pre-filter (index 안) vs post-filter (search 후) 의 strategies.
|
||||
|
||||
### Multi-tenant
|
||||
```ts
|
||||
// Approach 1: Separate index per tenant
|
||||
// Pinecone: 비싸 (index 당 cost)
|
||||
// Qdrant: collection 별 OK
|
||||
|
||||
// Approach 2: Shared index + tenant filter
|
||||
filter: { tenant_id: 'tenant-123' }
|
||||
|
||||
// Approach 3: Namespace (Pinecone)
|
||||
await index.namespace('tenant-123').upsert([...]);
|
||||
await index.namespace('tenant-123').query({ vector, topK: 10 });
|
||||
```
|
||||
|
||||
→ Namespace = isolation + scale.
|
||||
|
||||
### Multi-vector (image + text)
|
||||
```ts
|
||||
// Same space
|
||||
await collection.upsert([
|
||||
{ id: 'item1', vector: clipEmbedding },
|
||||
]);
|
||||
|
||||
// Or named vectors (Qdrant)
|
||||
await client.createCollection('items', {
|
||||
vectors: {
|
||||
image: { size: 512, distance: 'Cosine' },
|
||||
text: { size: 1536, distance: 'Cosine' },
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
→ Multi-modal search.
|
||||
|
||||
### Batch insert (큰 import)
|
||||
```ts
|
||||
const BATCH = 1000;
|
||||
|
||||
for (let i = 0; i < embeddings.length; i += BATCH) {
|
||||
const batch = embeddings.slice(i, i + BATCH);
|
||||
await index.upsert(batch);
|
||||
console.log(`${i + batch.length}/${embeddings.length}`);
|
||||
}
|
||||
```
|
||||
|
||||
→ Rate limit / memory 주의.
|
||||
|
||||
### Re-embed (model 변경)
|
||||
```
|
||||
모델 변경 (text-embedding-3-small → 3-large):
|
||||
- Embedding 변경 — 모든 doc re-embed
|
||||
- 큰 cost / 시간
|
||||
|
||||
해결:
|
||||
- 점진 (백그라운드)
|
||||
- 새 model = 새 namespace
|
||||
- 점진 traffic 이동
|
||||
```
|
||||
|
||||
### Backup / restore
|
||||
```ts
|
||||
// Pinecone
|
||||
await index.createBackup({ name: 'snapshot-2026' });
|
||||
|
||||
// Qdrant
|
||||
await client.createSnapshot('docs');
|
||||
|
||||
// 큰 dataset = 시간 + storage.
|
||||
```
|
||||
|
||||
### Search optimization
|
||||
```
|
||||
1. Reduce dim (Matryoshka): 1536 → 256 → 90% accuracy, 6x faster
|
||||
2. Binary quantization: 32x smaller, 70% accuracy
|
||||
3. Hybrid (vector + keyword): higher recall
|
||||
4. Reranker: top 50 → top 5 정밀
|
||||
5. Index parameter tune (ef_search, M)
|
||||
```
|
||||
|
||||
### When pgvector vs dedicated
|
||||
```
|
||||
pgvector:
|
||||
+ Postgres 의 query / transaction / join
|
||||
+ Single DB
|
||||
+ 작은 / 중간 (< 10M)
|
||||
- 큰 scale 약함
|
||||
|
||||
Dedicated:
|
||||
+ 큰 scale (100M+)
|
||||
+ Specialized index
|
||||
- 별 system
|
||||
- 추가 sync
|
||||
```
|
||||
|
||||
### Cloud comparisons
|
||||
```
|
||||
Pinecone:
|
||||
+ Easiest
|
||||
+ Best DX
|
||||
- 가장 비싸 (큰 scale)
|
||||
- Vendor lock
|
||||
|
||||
Qdrant Cloud:
|
||||
+ OSS + cloud
|
||||
+ 강력 features
|
||||
+ Cheap
|
||||
|
||||
Weaviate Cloud:
|
||||
+ Auto vectorize
|
||||
+ Hybrid 강
|
||||
|
||||
Vector DB on cloud (CF Vectorize, Vercel):
|
||||
+ Edge 가까이
|
||||
- 작은 features
|
||||
|
||||
Cohere / Voyage:
|
||||
+ Embedding + search 통합
|
||||
- Vendor lock
|
||||
```
|
||||
|
||||
### Edge vector search (CF Vectorize)
|
||||
```ts
|
||||
// wrangler.toml
|
||||
[[vectorize]]
|
||||
binding = "VECTORIZE"
|
||||
index_name = "my-index"
|
||||
```
|
||||
|
||||
```ts
|
||||
// Worker
|
||||
await env.VECTORIZE.upsert([
|
||||
{ id: 'doc1', values: embedding, metadata: {} },
|
||||
]);
|
||||
|
||||
const r = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
|
||||
```
|
||||
|
||||
→ Edge near-user.
|
||||
|
||||
### Monitoring
|
||||
```
|
||||
- Index size
|
||||
- Query latency (p50, p99)
|
||||
- QPS
|
||||
- Recall (sample test)
|
||||
- Cost per query
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| Scale | 추천 |
|
||||
|---|---|
|
||||
| < 1M | pgvector |
|
||||
| 1M-10M | Qdrant / Pinecone |
|
||||
| 10M-100M | Pinecone / Weaviate / Qdrant |
|
||||
| 100M-1B | Milvus / Vespa / Pinecone |
|
||||
| 1B+ | Vespa / Milvus + sharding |
|
||||
| Edge | CF Vectorize / Pinecone |
|
||||
| Hybrid (vector + text) | Vespa / Weaviate / pgvector + tsvector |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **모든 거 Pinecone (작은 scale)**: pgvector 충분.
|
||||
- **Filter 가 강함 + post-filter**: 느림. Pre-filter index.
|
||||
- **Quantization 가정 + recall 검증 X**: accuracy 떨어짐.
|
||||
- **Re-embed 무 plan**: model 변경 = 재시작.
|
||||
- **Single-region + global users**: latency.
|
||||
- **Backup 없음**: data 잃음.
|
||||
- **Hybrid 무 + pure vector**: keyword case 못 잡음.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- 시작 = pgvector.
|
||||
- Scale → Qdrant / Pinecone.
|
||||
- 큰 scale → Milvus / Vespa.
|
||||
- Hybrid + reranker = best quality.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DB_pgvector_Production]]
|
||||
- [[AI_RAG_Pattern_Basics]]
|
||||
- [[AI_RAG_Advanced]]
|
||||
Reference in New Issue
Block a user