9.8 KiB
9.8 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| db-vector-db-scaling | Vector DB Scaling — Pinecone / Qdrant / Weaviate / Milvus | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Vector DB Scaling
1M 미만 = pgvector 충분. 1M-100M = Qdrant / Weaviate. 100M-10B = Pinecone / Milvus / Vespa. Index type, sharding, replicas, hybrid 가 핵심.
📖 핵심 개념
- HNSW: 빠른 ANN.
- IVF: 작은 메모리 / index.
- Quantization: 8-bit / binary.
- Filtering: metadata 기반.
💻 코드 패턴
Pinecone (managed, 가장 인기)
import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({ apiKey });
const index = pc.index('my-index');
// Upsert
await index.upsert([
{ id: 'doc1', values: embedding1, metadata: { lang: 'en', tag: 'intro' } },
{ id: 'doc2', values: embedding2, metadata: { lang: 'ko', tag: 'main' } },
]);
// Query
const r = await index.query({
vector: queryEmbedding,
topK: 10,
includeMetadata: true,
filter: { lang: 'en' },
});
Qdrant (open-source, 강)
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://qdrant:6333' });
await client.createCollection('docs', {
vectors: { size: 1536, distance: 'Cosine' },
hnsw_config: { m: 16, ef_construct: 100 },
});
await client.upsert('docs', {
points: [
{
id: 'doc1',
vector: embedding1,
payload: { lang: 'en', tag: 'intro' },
},
],
});
const r = await client.search('docs', {
vector: queryEmbedding,
limit: 10,
filter: {
must: [{ key: 'lang', match: { value: 'en' } }],
},
});
→ Self-host 또는 cloud. 강력 filter.
Weaviate (semantic + hybrid)
import weaviate from 'weaviate-client';
const client = await weaviate.connectToCustom({
httpHost: 'weaviate',
httpPort: 8080,
});
const collection = client.collections.get('Docs');
await collection.data.insertMany([
{ properties: { content: 'Hello', lang: 'en' }, vector: embedding1 },
{ properties: { content: '안녕', lang: 'ko' }, vector: embedding2 },
]);
const r = await collection.query.nearVector(queryEmbedding, {
limit: 10,
filters: collection.filter.byProperty('lang').equal('en'),
});
→ Built-in vectorizer (auto embed).
Milvus (큰 scale)
from pymilvus import connections, Collection
connections.connect(host='milvus', port='19530')
collection = Collection('docs')
collection.insert([
[id1, id2],
[embedding1, embedding2],
[{'lang': 'en'}, {'lang': 'ko'}],
])
results = collection.search(
data=[query_embedding],
anns_field='embedding',
param={'metric_type': 'COSINE', 'params': {'ef': 64}},
limit=10,
expr='lang == "en"',
)
→ 10B+ scale. K8s native (Milvus Operator).
Vespa (큰 + hybrid)
schema docs {
document docs {
field id type string {}
field content type string { indexing: index | summary }
field lang type string { indexing: attribute }
field embedding type tensor<float>(x[1536]) {
indexing: attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 100
}
}
}
}
rank-profile default {
first-phase {
expression: closeness(field, embedding)
}
}
}
→ Yahoo / Spotify / 큰 search. Steep learning.
Index type comparison
HNSW (Hierarchical Navigable Small World):
+ 가장 빠른 search
+ 강력 recall
- 큰 메모리
- 새 build 시 큰 cost
IVF (Inverted File):
+ 작은 메모리
+ 빠른 build
- HNSW 보다 약간 느림
Flat (brute force):
+ 100% recall
- O(N) — 작은 dataset 만
PQ / SQ (Product / Scalar Quantization):
+ 매우 작은 메모리 (4-32x)
+ 큰 dataset
- Recall 약간 ↓
→ HNSW = default. PQ = 큰 scale.
Hybrid (vector + keyword)
// Weaviate
const r = await collection.query.hybrid(query, {
vector: queryEmbedding,
alpha: 0.5, // 0 = keyword, 1 = vector
limit: 10,
});
-- pgvector + tsvector
WITH v_hits AS (
SELECT id, 1 - (embedding <=> $1) AS v_score
FROM docs ORDER BY embedding <=> $1 LIMIT 100
),
t_hits AS (
SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
FROM docs WHERE tsv @@ plainto_tsquery($2) LIMIT 100
)
SELECT id, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score
FROM v_hits FULL OUTER JOIN t_hits USING (id)
ORDER BY score DESC LIMIT 10;
→ Vector 만 가 부족 — keyword 같이.
Quantization
// Pinecone — automatic
// Qdrant
await client.updateCollection('docs', {
quantization_config: {
scalar: { type: 'int8', always_ram: true },
},
});
// 4x 작은 메모리, 95%+ recall.
Sharding (10B+)
# Milvus / Weaviate / Vespa = 자동 sharding.
# Cluster mode.
# Pinecone = managed (자동).
# Qdrant cluster = manual.
Replication
Read replica:
- Read scale
- Failover
Multi-region:
- Edge user 가까이
- Cost ↑
Cost (대략)
Pinecone:
- Starter: $0
- Standard: $50/month + $0.40/M ops
- 1M vectors × 1536 dim = $50/month (s1)
Qdrant Cloud:
- Free: 1GB
- Paid: $0.05/GB/month
- 1M × 1536 dim = ~6GB = $0.30/month + compute
Weaviate Cloud: 비슷
Self-host (Qdrant):
- Server cost only
- 1M × 1536 dim = 6GB RAM
→ Self-host = 가장 cheap. Managed = 운영 X.
Performance
HNSW search (1M docs):
- Pinecone: ~30ms p99
- Qdrant: ~10ms (self-host SSD + RAM)
- Weaviate: ~20ms
- Milvus: ~10ms
- pgvector: ~50ms (HNSW)
→ Million scale = 비슷.
Billion scale = 큰 차이.
Filter (metadata)
// Pinecone
filter: {
$and: [
{ lang: 'en' },
{ date: { $gte: '2026-01-01' } },
],
}
// Qdrant
filter: {
must: [
{ key: 'lang', match: { value: 'en' } },
{ key: 'date', range: { gte: '2026-01-01' } },
],
}
→ Pre-filter (index 안) vs post-filter (search 후) 의 strategies.
Multi-tenant
// Approach 1: Separate index per tenant
// Pinecone: 비싸 (index 당 cost)
// Qdrant: collection 별 OK
// Approach 2: Shared index + tenant filter
filter: { tenant_id: 'tenant-123' }
// Approach 3: Namespace (Pinecone)
await index.namespace('tenant-123').upsert([...]);
await index.namespace('tenant-123').query({ vector, topK: 10 });
→ Namespace = isolation + scale.
Multi-vector (image + text)
// Same space
await collection.upsert([
{ id: 'item1', vector: clipEmbedding },
]);
// Or named vectors (Qdrant)
await client.createCollection('items', {
vectors: {
image: { size: 512, distance: 'Cosine' },
text: { size: 1536, distance: 'Cosine' },
},
});
→ Multi-modal search.
Batch insert (큰 import)
const BATCH = 1000;
for (let i = 0; i < embeddings.length; i += BATCH) {
const batch = embeddings.slice(i, i + BATCH);
await index.upsert(batch);
console.log(`${i + batch.length}/${embeddings.length}`);
}
→ Rate limit / memory 주의.
Re-embed (model 변경)
모델 변경 (text-embedding-3-small → 3-large):
- Embedding 변경 — 모든 doc re-embed
- 큰 cost / 시간
해결:
- 점진 (백그라운드)
- 새 model = 새 namespace
- 점진 traffic 이동
Backup / restore
// Pinecone
await index.createBackup({ name: 'snapshot-2026' });
// Qdrant
await client.createSnapshot('docs');
// 큰 dataset = 시간 + storage.
Search optimization
1. Reduce dim (Matryoshka): 1536 → 256 → 90% accuracy, 6x faster
2. Binary quantization: 32x smaller, 70% accuracy
3. Hybrid (vector + keyword): higher recall
4. Reranker: top 50 → top 5 정밀
5. Index parameter tune (ef_search, M)
When pgvector vs dedicated
pgvector:
+ Postgres 의 query / transaction / join
+ Single DB
+ 작은 / 중간 (< 10M)
- 큰 scale 약함
Dedicated:
+ 큰 scale (100M+)
+ Specialized index
- 별 system
- 추가 sync
Cloud comparisons
Pinecone:
+ Easiest
+ Best DX
- 가장 비싸 (큰 scale)
- Vendor lock
Qdrant Cloud:
+ OSS + cloud
+ 강력 features
+ Cheap
Weaviate Cloud:
+ Auto vectorize
+ Hybrid 강
Vector DB on cloud (CF Vectorize, Vercel):
+ Edge 가까이
- 작은 features
Cohere / Voyage:
+ Embedding + search 통합
- Vendor lock
Edge vector search (CF Vectorize)
// wrangler.toml
[[vectorize]]
binding = "VECTORIZE"
index_name = "my-index"
// Worker
await env.VECTORIZE.upsert([
{ id: 'doc1', values: embedding, metadata: {} },
]);
const r = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
→ Edge near-user.
Monitoring
- Index size
- Query latency (p50, p99)
- QPS
- Recall (sample test)
- Cost per query
🤔 의사결정 기준
| Scale | 추천 |
|---|---|
| < 1M | pgvector |
| 1M-10M | Qdrant / Pinecone |
| 10M-100M | Pinecone / Weaviate / Qdrant |
| 100M-1B | Milvus / Vespa / Pinecone |
| 1B+ | Vespa / Milvus + sharding |
| Edge | CF Vectorize / Pinecone |
| Hybrid (vector + text) | Vespa / Weaviate / pgvector + tsvector |
❌ 안티패턴
- 모든 거 Pinecone (작은 scale): pgvector 충분.
- Filter 가 강함 + post-filter: 느림. Pre-filter index.
- Quantization 가정 + recall 검증 X: accuracy 떨어짐.
- Re-embed 무 plan: model 변경 = 재시작.
- Single-region + global users: latency.
- Backup 없음: data 잃음.
- Hybrid 무 + pure vector: keyword case 못 잡음.
🤖 LLM 활용 힌트
- 시작 = pgvector.
- Scale → Qdrant / Pinecone.
- 큰 scale → Milvus / Vespa.
- Hybrid + reranker = best quality.