--- id: db-full-text-search title: Full-text Search — Postgres / Elasticsearch / Meilisearch category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [database, search, postgres, elasticsearch, vibe-coding] tech_stack: { language: "SQL / TS", applicable_to: ["Backend"] } applied_in: [] aliases: [full-text search, FTS, tsvector, GIN, Meilisearch, Typesense, OpenSearch] --- # Full-text Search > "맥북" 검색 = "macbook" 도 매치. **언어별 stemming + ranking**. Postgres FTS = 가벼운 시작, Meilisearch/Typesense = 빠른 typo, Elasticsearch/OpenSearch = 큰 규모. ## 📖 핵심 개념 - Tokenize: "맥북 m1" → ["맥북", "m1"]. - Stemming: "running" → "run". - Ranking: BM25 / TF-IDF. - Faceting: 카테고리 / 가격대 필터 + 검색. ## 💻 코드 패턴 ### Postgres FTS — 시작 ```sql ALTER TABLE products ADD COLUMN tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('simple', coalesce(name, '')), 'A') || setweight(to_tsvector('simple', coalesce(description, '')), 'B') ) STORED; CREATE INDEX products_tsv ON products USING GIN(tsv); ``` ```sql -- 검색 SELECT id, name, ts_rank(tsv, query) AS rank FROM products, plainto_tsquery('simple', 'macbook m1') query WHERE tsv @@ query ORDER BY rank DESC LIMIT 20; ``` ### 한국어 (pg_search 확장 필요) ```sql -- 또는 trigram (n-gram 비슷) CREATE EXTENSION pg_trgm; CREATE INDEX products_name_trgm ON products USING GIN (name gin_trgm_ops); SELECT * FROM products WHERE name % '맥북' ORDER BY similarity(name, '맥북') DESC; ``` ### Meilisearch (typo + ranking 자동) ```ts import { MeiliSearch } from 'meilisearch'; const ms = new MeiliSearch({ host: 'http://meilisearch:7700', apiKey: '...' }); const idx = ms.index('products'); // 인덱싱 await idx.addDocuments([{ id: '1', name: 'MacBook M1', price: 1000, category: 'laptop' }]); await idx.updateSettings({ searchableAttributes: ['name', 'description'], filterableAttributes: ['category', 'price'], rankingRules: ['words', 'typo', 'proximity', 'attribute', 'sort', 'exactness'], }); // 검색 const r = await idx.search('macboo', { // typo OK filter: 'category = "laptop" AND price < 2000', limit: 20, attributesToHighlight: ['name'], }); ``` ### Elasticsearch ```ts import { Client } from '@elastic/elasticsearch'; const es = new Client({ node: 'http://elasticsearch:9200' }); // 매핑 await es.indices.create({ index: 'products', body: { mappings: { properties: { name: { type: 'text', analyzer: 'standard' }, description: { type: 'text' }, price: { type: 'float' }, category: { type: 'keyword' }, }, }, }, }); // 검색 const r = await es.search({ index: 'products', query: { bool: { must: [{ multi_match: { query: 'macbook', fields: ['name^3', 'description'] } }], filter: [{ term: { category: 'laptop' } }, { range: { price: { lt: 2000 } } }], }, }, highlight: { fields: { name: {} } }, size: 20, }); ``` ### Hybrid (vector + keyword) ```sql -- pgvector + FTS 결합 WITH v_hits AS ( SELECT id, 1 - (embedding <=> $1::vector) AS v_score FROM products ORDER BY embedding <=> $1::vector LIMIT 100 ), t_hits AS ( SELECT id, ts_rank(tsv, plainto_tsquery($2)) AS t_score FROM products WHERE tsv @@ plainto_tsquery($2) LIMIT 100 ) SELECT id, COALESCE(v_score, 0) * 0.6 + COALESCE(t_score, 0) * 0.4 AS score FROM v_hits FULL OUTER JOIN t_hits USING (id) ORDER BY score DESC LIMIT 20; ``` ### Faceting ```ts // Meilisearch const r = await idx.search('macbook', { facets: ['category', 'price_range'], }); // r.facetDistribution: { category: { laptop: 50, desktop: 5 } } ``` ### Suggest / autocomplete ```ts // Meilisearch: prefix 자동 await idx.search('mac', { limit: 5 }); // Elasticsearch: completion suggester 또는 edge ngram ``` ### Sync (DB → 검색 엔진) ```ts // CDC 또는 outbox 로 변경 → 검색 인덱스 업데이트 on('product.changed', async (p) => { await idx.addDocuments([p]); // upsert }); on('product.deleted', async (id) => { await idx.deleteDocument(id); }); ``` ## 🤔 의사결정 기준 | 규모 | 추천 | |---|---| | <1M docs, simple | Postgres FTS | | Typo 강함 | Meilisearch / Typesense | | 대규모 + 분석 + 복잡 | Elasticsearch / OpenSearch | | Hybrid (semantic + keyword) | pgvector + FTS / Vespa | | Code search | Sourcegraph / Algolia | | 사용자별 권한 + 검색 | per-user filter | ## ❌ 안티패턴 - **`LIKE '%query%'`**: 인덱스 안 탐. 느림. - **GIN 인덱스 없이 tsvector**: 같은 결과지만 느림. - **Regex 검색 prod**: pre-compute 가 답. - **모든 컬럼 인덱싱**: 인덱스 크기. searchable 필드 명시. - **Stemming 없는 영어**: "runs" 검색이 "running" 못 찾음. - **단순 prefix only**: typo 무시. - **App-level dedup**: 검색 엔진의 ranking 가 나음. - **Sync 비동기 lag 무시**: search 결과가 stale. ## 🤖 LLM 활용 힌트 - 작은 = pg FTS / pg_trgm. - typo 강 = Meilisearch. - 큰 = Elasticsearch. - Hybrid = pgvector + FTS rerank. ## 🔗 관련 문서 - [[AI_RAG_Pattern_Basics]] - [[DB_JSONB_Postgres_Patterns]] - [[DB_Change_Data_Capture]]