5.2 KiB
5.2 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| db-full-text-search | Full-text Search — Postgres / Elasticsearch / Meilisearch | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Full-text Search
"맥북" 검색 = "macbook" 도 매치. 언어별 stemming + ranking. Postgres FTS = 가벼운 시작, Meilisearch/Typesense = 빠른 typo, Elasticsearch/OpenSearch = 큰 규모.
📖 핵심 개념
- Tokenize: "맥북 m1" → ["맥북", "m1"].
- Stemming: "running" → "run".
- Ranking: BM25 / TF-IDF.
- Faceting: 카테고리 / 가격대 필터 + 검색.
💻 코드 패턴
Postgres FTS — 시작
ALTER TABLE products ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('simple', coalesce(name, '')), 'A') ||
setweight(to_tsvector('simple', coalesce(description, '')), 'B')
) STORED;
CREATE INDEX products_tsv ON products USING GIN(tsv);
-- 검색
SELECT id, name,
ts_rank(tsv, query) AS rank
FROM products, plainto_tsquery('simple', 'macbook m1') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;
한국어 (pg_search 확장 필요)
-- 또는 trigram (n-gram 비슷)
CREATE EXTENSION pg_trgm;
CREATE INDEX products_name_trgm ON products USING GIN (name gin_trgm_ops);
SELECT * FROM products WHERE name % '맥북' ORDER BY similarity(name, '맥북') DESC;
Meilisearch (typo + ranking 자동)
import { MeiliSearch } from 'meilisearch';
const ms = new MeiliSearch({ host: 'http://meilisearch:7700', apiKey: '...' });
const idx = ms.index('products');
// 인덱싱
await idx.addDocuments([{ id: '1', name: 'MacBook M1', price: 1000, category: 'laptop' }]);
await idx.updateSettings({
searchableAttributes: ['name', 'description'],
filterableAttributes: ['category', 'price'],
rankingRules: ['words', 'typo', 'proximity', 'attribute', 'sort', 'exactness'],
});
// 검색
const r = await idx.search('macboo', { // typo OK
filter: 'category = "laptop" AND price < 2000',
limit: 20,
attributesToHighlight: ['name'],
});
Elasticsearch
import { Client } from '@elastic/elasticsearch';
const es = new Client({ node: 'http://elasticsearch:9200' });
// 매핑
await es.indices.create({
index: 'products',
body: {
mappings: {
properties: {
name: { type: 'text', analyzer: 'standard' },
description: { type: 'text' },
price: { type: 'float' },
category: { type: 'keyword' },
},
},
},
});
// 검색
const r = await es.search({
index: 'products',
query: {
bool: {
must: [{ multi_match: { query: 'macbook', fields: ['name^3', 'description'] } }],
filter: [{ term: { category: 'laptop' } }, { range: { price: { lt: 2000 } } }],
},
},
highlight: { fields: { name: {} } },
size: 20,
});
Hybrid (vector + keyword)
-- pgvector + FTS 결합
WITH v_hits AS (
SELECT id, 1 - (embedding <=> $1::vector) AS v_score
FROM products ORDER BY embedding <=> $1::vector LIMIT 100
),
t_hits AS (
SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
FROM products WHERE tsv @@ plainto_tsquery($2) LIMIT 100
)
SELECT id, COALESCE(v_score, 0) * 0.6 + COALESCE(t_score, 0) * 0.4 AS score
FROM v_hits FULL OUTER JOIN t_hits USING (id)
ORDER BY score DESC LIMIT 20;
Faceting
// Meilisearch
const r = await idx.search('macbook', {
facets: ['category', 'price_range'],
});
// r.facetDistribution: { category: { laptop: 50, desktop: 5 } }
Suggest / autocomplete
// Meilisearch: prefix 자동
await idx.search('mac', { limit: 5 });
// Elasticsearch: completion suggester 또는 edge ngram
Sync (DB → 검색 엔진)
// CDC 또는 outbox 로 변경 → 검색 인덱스 업데이트
on('product.changed', async (p) => {
await idx.addDocuments([p]); // upsert
});
on('product.deleted', async (id) => {
await idx.deleteDocument(id);
});
🤔 의사결정 기준
| 규모 | 추천 |
|---|---|
| <1M docs, simple | Postgres FTS |
| Typo 강함 | Meilisearch / Typesense |
| 대규모 + 분석 + 복잡 | Elasticsearch / OpenSearch |
| Hybrid (semantic + keyword) | pgvector + FTS / Vespa |
| Code search | Sourcegraph / Algolia |
| 사용자별 권한 + 검색 | per-user filter |
❌ 안티패턴
LIKE '%query%': 인덱스 안 탐. 느림.- GIN 인덱스 없이 tsvector: 같은 결과지만 느림.
- Regex 검색 prod: pre-compute 가 답.
- 모든 컬럼 인덱싱: 인덱스 크기. searchable 필드 명시.
- Stemming 없는 영어: "runs" 검색이 "running" 못 찾음.
- 단순 prefix only: typo 무시.
- App-level dedup: 검색 엔진의 ranking 가 나음.
- Sync 비동기 lag 무시: search 결과가 stale.
🤖 LLM 활용 힌트
- 작은 = pg FTS / pg_trgm.
- typo 강 = Meilisearch.
- 큰 = Elasticsearch.
- Hybrid = pgvector + FTS rerank.