Files
2nd/10_Wiki/Topics/Coding/DB_Full_Text_Search.md
T
2026-05-09 21:08:02 +09:00

5.2 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
db-full-text-search Full-text Search — Postgres / Elasticsearch / Meilisearch Coding draft B conceptual 2026-05-09 2026-05-09
database
search
postgres
elasticsearch
vibe-coding
language applicable_to
SQL / TS
Backend
full-text search
FTS
tsvector
GIN
Meilisearch
Typesense
OpenSearch

Full-text Search

"맥북" 검색 = "macbook" 도 매치. 언어별 stemming + ranking. Postgres FTS = 가벼운 시작, Meilisearch/Typesense = 빠른 typo, Elasticsearch/OpenSearch = 큰 규모.

📖 핵심 개념

  • Tokenize: "맥북 m1" → ["맥북", "m1"].
  • Stemming: "running" → "run".
  • Ranking: BM25 / TF-IDF.
  • Faceting: 카테고리 / 가격대 필터 + 검색.

💻 코드 패턴

Postgres FTS — 시작

ALTER TABLE products ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('simple', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('simple', coalesce(description, '')), 'B')
  ) STORED;

CREATE INDEX products_tsv ON products USING GIN(tsv);
-- 검색
SELECT id, name,
       ts_rank(tsv, query) AS rank
FROM products, plainto_tsquery('simple', 'macbook m1') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;

한국어 (pg_search 확장 필요)

-- 또는 trigram (n-gram 비슷)
CREATE EXTENSION pg_trgm;
CREATE INDEX products_name_trgm ON products USING GIN (name gin_trgm_ops);

SELECT * FROM products WHERE name % '맥북' ORDER BY similarity(name, '맥북') DESC;

Meilisearch (typo + ranking 자동)

import { MeiliSearch } from 'meilisearch';

const ms = new MeiliSearch({ host: 'http://meilisearch:7700', apiKey: '...' });
const idx = ms.index('products');

// 인덱싱
await idx.addDocuments([{ id: '1', name: 'MacBook M1', price: 1000, category: 'laptop' }]);
await idx.updateSettings({
  searchableAttributes: ['name', 'description'],
  filterableAttributes: ['category', 'price'],
  rankingRules: ['words', 'typo', 'proximity', 'attribute', 'sort', 'exactness'],
});

// 검색
const r = await idx.search('macboo', { // typo OK
  filter: 'category = "laptop" AND price < 2000',
  limit: 20,
  attributesToHighlight: ['name'],
});

Elasticsearch

import { Client } from '@elastic/elasticsearch';

const es = new Client({ node: 'http://elasticsearch:9200' });

// 매핑
await es.indices.create({
  index: 'products',
  body: {
    mappings: {
      properties: {
        name:        { type: 'text', analyzer: 'standard' },
        description: { type: 'text' },
        price:       { type: 'float' },
        category:    { type: 'keyword' },
      },
    },
  },
});

// 검색
const r = await es.search({
  index: 'products',
  query: {
    bool: {
      must: [{ multi_match: { query: 'macbook', fields: ['name^3', 'description'] } }],
      filter: [{ term: { category: 'laptop' } }, { range: { price: { lt: 2000 } } }],
    },
  },
  highlight: { fields: { name: {} } },
  size: 20,
});

Hybrid (vector + keyword)

-- pgvector + FTS 결합
WITH v_hits AS (
  SELECT id, 1 - (embedding <=> $1::vector) AS v_score
  FROM products ORDER BY embedding <=> $1::vector LIMIT 100
),
t_hits AS (
  SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score
  FROM products WHERE tsv @@ plainto_tsquery($2) LIMIT 100
)
SELECT id, COALESCE(v_score, 0) * 0.6 + COALESCE(t_score, 0) * 0.4 AS score
FROM v_hits FULL OUTER JOIN t_hits USING (id)
ORDER BY score DESC LIMIT 20;

Faceting

// Meilisearch
const r = await idx.search('macbook', {
  facets: ['category', 'price_range'],
});
// r.facetDistribution: { category: { laptop: 50, desktop: 5 } }

Suggest / autocomplete

// Meilisearch: prefix 자동
await idx.search('mac', { limit: 5 });

// Elasticsearch: completion suggester 또는 edge ngram

Sync (DB → 검색 엔진)

// CDC 또는 outbox 로 변경 → 검색 인덱스 업데이트
on('product.changed', async (p) => {
  await idx.addDocuments([p]); // upsert
});
on('product.deleted', async (id) => {
  await idx.deleteDocument(id);
});

🤔 의사결정 기준

규모 추천
<1M docs, simple Postgres FTS
Typo 강함 Meilisearch / Typesense
대규모 + 분석 + 복잡 Elasticsearch / OpenSearch
Hybrid (semantic + keyword) pgvector + FTS / Vespa
Code search Sourcegraph / Algolia
사용자별 권한 + 검색 per-user filter

안티패턴

  • LIKE '%query%': 인덱스 안 탐. 느림.
  • GIN 인덱스 없이 tsvector: 같은 결과지만 느림.
  • Regex 검색 prod: pre-compute 가 답.
  • 모든 컬럼 인덱싱: 인덱스 크기. searchable 필드 명시.
  • Stemming 없는 영어: "runs" 검색이 "running" 못 찾음.
  • 단순 prefix only: typo 무시.
  • App-level dedup: 검색 엔진의 ranking 가 나음.
  • Sync 비동기 lag 무시: search 결과가 stale.

🤖 LLM 활용 힌트

  • 작은 = pg FTS / pg_trgm.
  • typo 강 = Meilisearch.
  • 큰 = Elasticsearch.
  • Hybrid = pgvector + FTS rerank.

🔗 관련 문서