Files
2nd/10_Wiki/Topics/Coding/DB_Vector_DB_Scaling.md
T
2026-05-09 22:47:42 +09:00

9.8 KiB
Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
db-vector-db-scaling Vector DB Scaling — Pinecone / Qdrant / Weaviate / Milvus Coding draft B conceptual 2026-05-09 2026-05-09
database
vector
scaling
vibe-coding
language applicable_to
TS / Python
Backend
Pinecone
Qdrant
Weaviate
Milvus
Vespa
vector index
HNSW
IVF

Vector DB Scaling

1M 미만 = pgvector 충분. 1M-100M = Qdrant / Weaviate. 100M-10B = Pinecone / Milvus / Vespa. Index type, sharding, replicas, hybrid 가 핵심.

📖 핵심 개념

  • HNSW: 빠른 ANN.
  • IVF: 작은 메모리 / index.
  • Quantization: 8-bit / binary.
  • Filtering: metadata 기반.

💻 코드 패턴

Pinecone (managed, 가장 인기)

import { Pinecone } from '@pinecone-database/pinecone';

const pc = new Pinecone({ apiKey });

const index = pc.index('my-index');

// Upsert
await index.upsert([
  { id: 'doc1', values: embedding1, metadata: { lang: 'en', tag: 'intro' } },
  { id: 'doc2', values: embedding2, metadata: { lang: 'ko', tag: 'main' } },
]);

// Query
const r = await index.query({
  vector: queryEmbedding,
  topK: 10,
  includeMetadata: true,
  filter: { lang: 'en' },
});

Qdrant (open-source, 강)

import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://qdrant:6333' });

await client.createCollection('docs', {
  vectors: { size: 1536, distance: 'Cosine' },
  hnsw_config: { m: 16, ef_construct: 100 },
});

await client.upsert('docs', {
  points: [
    {
      id: 'doc1',
      vector: embedding1,
      payload: { lang: 'en', tag: 'intro' },
    },
  ],
});

const r = await client.search('docs', {
  vector: queryEmbedding,
  limit: 10,
  filter: {
    must: [{ key: 'lang', match: { value: 'en' } }],
  },
});

→ Self-host 또는 cloud. 강력 filter.

Weaviate (semantic + hybrid)

import weaviate from 'weaviate-client';

const client = await weaviate.connectToCustom({
  httpHost: 'weaviate',
  httpPort: 8080,
});

const collection = client.collections.get('Docs');

await collection.data.insertMany([
  { properties: { content: 'Hello', lang: 'en' }, vector: embedding1 },
  { properties: { content: '안녕', lang: 'ko' }, vector: embedding2 },
]);

const r = await collection.query.nearVector(queryEmbedding, {
  limit: 10,
  filters: collection.filter.byProperty('lang').equal('en'),
});

→ Built-in vectorizer (auto embed).

Milvus (큰 scale)

from pymilvus import connections, Collection

connections.connect(host='milvus', port='19530')

collection = Collection('docs')
collection.insert([
    [id1, id2],
    [embedding1, embedding2],
    [{'lang': 'en'}, {'lang': 'ko'}],
])

results = collection.search(
    data=[query_embedding],
    anns_field='embedding',
    param={'metric_type': 'COSINE', 'params': {'ef': 64}},
    limit=10,
    expr='lang == "en"',
)

→ 10B+ scale. K8s native (Milvus Operator).

Vespa (큰 + hybrid)

schema docs {
    document docs {
        field id type string {}
        field content type string { indexing: index | summary }
        field lang type string { indexing: attribute }
        field embedding type tensor<float>(x[1536]) {
            indexing: attribute | index
            attribute {
                distance-metric: angular
            }
            index {
                hnsw {
                    max-links-per-node: 16
                    neighbors-to-explore-at-insert: 100
                }
            }
        }
    }
    
    rank-profile default {
        first-phase {
            expression: closeness(field, embedding)
        }
    }
}

→ Yahoo / Spotify / 큰 search. Steep learning.

Index type comparison

HNSW (Hierarchical Navigable Small World):
+ 가장 빠른 search
+ 강력 recall
- 큰 메모리
- 새 build 시 큰 cost

IVF (Inverted File):
+ 작은 메모리
+ 빠른 build
- HNSW 보다 약간 느림

Flat (brute force):
+ 100% recall
- O(N) — 작은 dataset 만

PQ / SQ (Product / Scalar Quantization):
+ 매우 작은 메모리 (4-32x)
+ 큰 dataset
- Recall 약간 ↓

→ HNSW = default. PQ = 큰 scale.

Hybrid (vector + keyword)

// Weaviate
const r = await collection.query.hybrid(query, {
  vector: queryEmbedding,
  alpha: 0.5,  // 0 = keyword, 1 = vector
  limit: 10,
});
-- pgvector + tsvector
WITH v_hits AS (
  SELECT id, 1 - (embedding <=> $1) AS v_score
  FROM docs ORDER BY embedding <=> $1 LIMIT 100
),
t_hits AS (
  SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
  FROM docs WHERE tsv @@ plainto_tsquery($2) LIMIT 100
)
SELECT id, COALESCE(v_score, 0) * 0.7 + COALESCE(t_score, 0) * 0.3 AS score
FROM v_hits FULL OUTER JOIN t_hits USING (id)
ORDER BY score DESC LIMIT 10;

→ Vector 만 가 부족 — keyword 같이.

AI_RAG_Advanced.

Quantization

// Pinecone — automatic
// Qdrant
await client.updateCollection('docs', {
  quantization_config: {
    scalar: { type: 'int8', always_ram: true },
  },
});

// 4x 작은 메모리, 95%+ recall.

Sharding (10B+)

# Milvus / Weaviate / Vespa = 자동 sharding.
# Cluster mode.

# Pinecone = managed (자동).
# Qdrant cluster = manual.

Replication

Read replica:
- Read scale
- Failover

Multi-region:
- Edge user 가까이
- Cost ↑

Cost (대략)

Pinecone:
- Starter: $0
- Standard: $50/month + $0.40/M ops
- 1M vectors × 1536 dim = $50/month (s1)

Qdrant Cloud:
- Free: 1GB
- Paid: $0.05/GB/month
- 1M × 1536 dim = ~6GB = $0.30/month + compute

Weaviate Cloud: 비슷

Self-host (Qdrant):
- Server cost only
- 1M × 1536 dim = 6GB RAM

→ Self-host = 가장 cheap. Managed = 운영 X.

Performance

HNSW search (1M docs):
- Pinecone:    ~30ms p99
- Qdrant:      ~10ms (self-host SSD + RAM)
- Weaviate:    ~20ms
- Milvus:      ~10ms
- pgvector:     ~50ms (HNSW)

→ Million scale = 비슷.
   Billion scale = 큰 차이.

Filter (metadata)

// Pinecone
filter: { 
  $and: [
    { lang: 'en' },
    { date: { $gte: '2026-01-01' } },
  ],
}

// Qdrant
filter: {
  must: [
    { key: 'lang', match: { value: 'en' } },
    { key: 'date', range: { gte: '2026-01-01' } },
  ],
}

→ Pre-filter (index 안) vs post-filter (search 후) 의 strategies.

Multi-tenant

// Approach 1: Separate index per tenant
// Pinecone: 비싸 (index 당 cost)
// Qdrant: collection 별 OK

// Approach 2: Shared index + tenant filter
filter: { tenant_id: 'tenant-123' }

// Approach 3: Namespace (Pinecone)
await index.namespace('tenant-123').upsert([...]);
await index.namespace('tenant-123').query({ vector, topK: 10 });

→ Namespace = isolation + scale.

Multi-vector (image + text)

// Same space
await collection.upsert([
  { id: 'item1', vector: clipEmbedding },
]);

// Or named vectors (Qdrant)
await client.createCollection('items', {
  vectors: {
    image: { size: 512, distance: 'Cosine' },
    text: { size: 1536, distance: 'Cosine' },
  },
});

→ Multi-modal search.

Batch insert (큰 import)

const BATCH = 1000;

for (let i = 0; i < embeddings.length; i += BATCH) {
  const batch = embeddings.slice(i, i + BATCH);
  await index.upsert(batch);
  console.log(`${i + batch.length}/${embeddings.length}`);
}

→ Rate limit / memory 주의.

Re-embed (model 변경)

모델 변경 (text-embedding-3-small → 3-large):
- Embedding 변경 — 모든 doc re-embed
- 큰 cost / 시간

해결:
- 점진 (백그라운드)
- 새 model = 새 namespace
- 점진 traffic 이동

Backup / restore

// Pinecone
await index.createBackup({ name: 'snapshot-2026' });

// Qdrant
await client.createSnapshot('docs');

// 큰 dataset = 시간 + storage.

Search optimization

1. Reduce dim (Matryoshka): 1536 → 256 → 90% accuracy, 6x faster
2. Binary quantization: 32x smaller, 70% accuracy
3. Hybrid (vector + keyword): higher recall
4. Reranker: top 50 → top 5 정밀
5. Index parameter tune (ef_search, M)

When pgvector vs dedicated

pgvector:
+ Postgres 의 query / transaction / join
+ Single DB
+ 작은 / 중간 (< 10M)
- 큰 scale 약함

Dedicated:
+ 큰 scale (100M+)
+ Specialized index
- 별 system
- 추가 sync

Cloud comparisons

Pinecone:
+ Easiest
+ Best DX
- 가장 비싸 (큰 scale)
- Vendor lock

Qdrant Cloud:
+ OSS + cloud
+ 강력 features
+ Cheap

Weaviate Cloud:
+ Auto vectorize
+ Hybrid 강

Vector DB on cloud (CF Vectorize, Vercel):
+ Edge 가까이
- 작은 features

Cohere / Voyage:
+ Embedding + search 통합
- Vendor lock

Edge vector search (CF Vectorize)

// wrangler.toml
[[vectorize]]
binding = "VECTORIZE"
index_name = "my-index"
// Worker
await env.VECTORIZE.upsert([
  { id: 'doc1', values: embedding, metadata: {} },
]);

const r = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });

→ Edge near-user.

Monitoring

- Index size
- Query latency (p50, p99)
- QPS
- Recall (sample test)
- Cost per query

🤔 의사결정 기준

Scale 추천
< 1M pgvector
1M-10M Qdrant / Pinecone
10M-100M Pinecone / Weaviate / Qdrant
100M-1B Milvus / Vespa / Pinecone
1B+ Vespa / Milvus + sharding
Edge CF Vectorize / Pinecone
Hybrid (vector + text) Vespa / Weaviate / pgvector + tsvector

안티패턴

  • 모든 거 Pinecone (작은 scale): pgvector 충분.
  • Filter 가 강함 + post-filter: 느림. Pre-filter index.
  • Quantization 가정 + recall 검증 X: accuracy 떨어짐.
  • Re-embed 무 plan: model 변경 = 재시작.
  • Single-region + global users: latency.
  • Backup 없음: data 잃음.
  • Hybrid 무 + pure vector: keyword case 못 잡음.

🤖 LLM 활용 힌트

  • 시작 = pgvector.
  • Scale → Qdrant / Pinecone.
  • 큰 scale → Milvus / Vespa.
  • Hybrid + reranker = best quality.

🔗 관련 문서