Files
2nd/10_Wiki/Topics/Coding/DB_Query_Optimization.md
T
2026-05-09 21:08:02 +09:00

6.2 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
db-query-optimization Query Optimization — Index / Rewrite / 분리 Coding draft B conceptual 2026-05-09 2026-05-09
database
query
optimization
vibe-coding
language applicable_to
SQL / Postgres
Backend
query optimization
SARGable
covering index
CTE
materialized view
denormalization

Query Optimization

Index 가 일반 답. 그러나 query rewrite, denormalization, materialized view, partition 도 무기. SARGable predicate, covering index, CTE.

📖 핵심 개념

  • SARGable: index 사용 가능한 predicate.
  • Covering index: query 가 필요한 모든 컬럼 포함.
  • Denormalization: read 위해 일부 중복.
  • Materialized view: 미리 계산.

💻 코드 패턴

SARGable rewrite

-- ❌ Non-SARGable
WHERE EXTRACT(YEAR FROM created_at) = 2026
WHERE LOWER(email) = 'a@b.com'
WHERE id::text = '42'

-- ✅ SARGable
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
-- email = 'a@b.com'  (index 가 case-insensitive 면)
-- 또는 functional index
CREATE INDEX users_email_lower ON users ((LOWER(email)));
WHERE LOWER(email) = 'a@b.com'  -- 이제 SARGable

Covering index

-- 자주 query
SELECT id, status FROM orders WHERE user_id = $1;

-- ✅ Covering index — heap 접근 X (Index Only Scan)
CREATE INDEX orders_user_covering ON orders (user_id) INCLUDE (id, status);

→ Postgres INCLUDE (11+).

Composite index — leftmost

CREATE INDEX o_idx ON orders (user_id, status, created_at);

-- ✅ 사용
WHERE user_id = $1
WHERE user_id = $1 AND status = 'paid'
WHERE user_id = $1 AND status = 'paid' AND created_at > $2

-- ❌ Leading 안 맞음
WHERE status = 'paid'  -- 새 인덱스 필요
WHERE created_at > $2

Selectivity (cardinality) 우선

-- email (high cardinality, 1M unique) > status (3 unique)
CREATE INDEX users (email, status);  -- email 먼저

→ 첫 컬럼이 가장 selective.

Partial index (조건부)

-- 활성 user 만 자주 query
CREATE INDEX users_active ON users (email) WHERE deleted_at IS NULL;

SELECT * FROM users WHERE email = $1 AND deleted_at IS NULL;
-- → 작은 인덱스, 빠름

Expression index

CREATE INDEX events_lower_event ON events (LOWER(event_type));

Materialized view (자주 query, 가끔 새로고침)

CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, count(*) AS orders, sum(total) AS spent
FROM orders GROUP BY user_id;

CREATE UNIQUE INDEX user_stats_pk ON user_stats (user_id);

-- 새로고침
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

→ 분 / 시간 마다 cron.

Denormalization

-- ❌ 매 read 가 join
SELECT o.*, u.email FROM orders o JOIN users u ON o.user_id = u.id;

-- ✅ orders 안에 email 복사 (immutable 또는 수용)
ALTER TABLE orders ADD COLUMN user_email TEXT;
-- INSERT 시 같이 채움

→ Write 비용 ↑ but read 큰 절약.

CTE (WITH)

WITH recent_orders AS (
  SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT user_id, count(*) FROM recent_orders GROUP BY user_id;

⚠️ Postgres 12+ = inline. 옛 PG = optimization barrier.

LATERAL join (각 row 마다 다른 query)

SELECT u.*, last_order.total
FROM users u
LEFT JOIN LATERAL (
  SELECT * FROM orders o
  WHERE o.user_id = u.id ORDER BY created_at DESC LIMIT 1
) last_order ON true;

→ 각 user 의 마지막 order. Subquery 보다 효율.

EXISTS vs IN

-- ✅ EXISTS — short-circuit
SELECT * FROM users WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = users.id);

-- ⚠️ IN — 큰 list 면 hash
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);

→ 보통 같은 plan, but EXISTS 안 NULL 안전.

Pagination — keyset > offset

-- ❌ 큰 offset
SELECT * FROM orders ORDER BY id DESC OFFSET 100000 LIMIT 20;

-- ✅ Keyset
SELECT * FROM orders WHERE id < $cursor ORDER BY id DESC LIMIT 20;

Batch (다중 row 한 query)

-- ❌ N+1
for (id of ids) await db.query('SELECT * FROM users WHERE id = $1', [id]);

-- ✅ Batch
SELECT * FROM users WHERE id = ANY($1::uuid[]);
const users = await db.query('SELECT * FROM users WHERE id = ANY($1)', [ids]);

EXPLAIN reads

EXPLAIN ANALYZE SELECT ...;
-- "actual time" 가 일관 빠름인지
-- "Buffers: shared read=" 가 큰지 (디스크 I/O)
-- "Rows Removed by Filter" 가 큰지 (인덱스 필요)

통계 + ANALYZE

ANALYZE orders;  -- statistics 업데이트
-- autovacuum 가 보통 자동 — 큰 변경 후 명시적 도움

Statistics extended

-- 두 컬럼이 correlated
CREATE STATISTICS s_user_status ON user_id, status FROM orders;
ANALYZE orders;

→ 더 정확한 row estimate.

Index hint (Postgres pg_hint_plan extension)

/*+ IndexScan(orders orders_user_idx) */
SELECT * FROM orders WHERE user_id = $1;

→ 마지막 수단. 보통 ANALYZE / 더 좋은 index.

N+1 in app

// ❌
for (const user of users) {
  user.orders = await db.orders.findByUser(user.id);
}

// ✅ DataLoader / Prisma include / SQL JOIN
const orders = await db.orders.findMany({ where: { userId: { in: userIds } } });
const byUser = groupBy(orders, 'userId');
users.forEach(u => u.orders = byUser[u.id] ?? []);

🤔 의사결정 기준

패턴 사용
자주 read 같은 query Index
read 많고 write 적음 Materialized view
Read >> write 큰 차이 Denormalize / CDC
부분 자주 Partial index
큰 group by Aggregating MV
Top N per group Window function / LATERAL

안티패턴

  • Non-SARGable predicate: index 사용 못 함.
  • SELECT * + 큰 row: I/O 큼.
  • N+1 query: app loop. JOIN / batch.
  • 모든 column index: write 비용 ↑.
  • Materialized view 안 refresh: stale.
  • CTE 가정 + 옛 PG (< 12): optimization barrier.
  • OFFSET 큰 page: 모든 row 읽음.

🤖 LLM 활용 힌트

  • EXPLAIN ANALYZE 후 액션.
  • Index — composite + covering + partial.
  • Read 비싼 query = MV / denormalization.

🔗 관련 문서