---
id: cs-btree-lsm-storage
title: B-Tree vs LSM-Tree — Storage 엔진
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [cs, storage, btree, lsm, vibe-coding]
tech_stack: { language: "Concept", applicable_to: ["Database"] }
applied_in: []
aliases: [B-Tree, LSM-Tree, RocksDB, Postgres, MyISAM, write amplification, read amplification]
---

# B-Tree vs LSM-Tree

> DB 의 두 storage engine. **B-Tree (Postgres / MySQL InnoDB) = read 빠름, in-place update**. **LSM-Tree (RocksDB / Cassandra / ScyllaDB) = write 빠름, append-only**. Trade-off: read amp / write amp / space amp.

## 📖 핵심 개념
- B-Tree: balanced tree, in-place update.
- LSM: write → memtable → SSTable (immutable) → compaction.
- Read amplification: 한 read 가 N file 검사.
- Write amplification: 한 write 가 N 번 disk write.
- Space amplification: 데이터 + 사본 / 압축 차이.

## 💻 코드 패턴

### B-Tree 동작
```
Read:    Root → branch → leaf.  log(N) seek.
Write:   Page 직접 변경 (또는 WAL + page flush).
Delete:  Page 안 mark, vacuum 으로 정리.

장점: O(log N) read, range scan 빠름, mature.
단점: Page split 비싸, 작은 random write 가 page 다시 write.
```

### LSM 동작
```
Write:
1. Memtable (RAM, sorted) 에 추가
2. Memtable 가득 → SSTable (sorted, immutable) 로 flush
3. Compaction: 여러 SSTable → 합치기

Read:
1. Memtable 검사
2. 각 level 의 SSTable 검사 (Bloom filter 가 skip)
3. 가장 최신 version 반환

Delete: tombstone 추가. Compaction 가 정리.
```

### Compaction strategy
```
Leveled (RocksDB):
- Level N = N+1 의 ~10x 크기
- 작은 read amp, 큰 write amp

Tiered (Cassandra):
- 같은 level 의 작은 SSTable 합치기
- 작은 write amp, 큰 read amp

Hybrid: ScyllaDB.
```

### B-Tree 의 page 구조
```
[ Page header | Key1 → Pointer1 | Key2 → Pointer2 | ... ]

Page size: 보통 8KB (Postgres) / 16KB (MySQL).
Fillfactor: 80% — UPDATE 위 free space 남김 (HOT update).
```

### LSM 의 SSTable
```
[ Header | Index | Bloom filter | Sorted key-value pairs | Footer ]

Index = sparse (every Nth key).
Bloom filter = 이 key 가 이 SSTable 에 없을지 빠른 검사.
```

### Write amplification 실측
```
Insert 1 byte → disk 에 N bytes write.

B-Tree: 보통 2-10x (page write + WAL).
LSM (leveled): 10-30x (compaction).
LSM (tiered): 5-15x.
```

### Read amplification
```
Get key X →

B-Tree: log(N) page (cache 가 보통 처리).
LSM:    여러 level + memtable. Bloom 가 skip 도와줌.
```

### Space amplification
```
1GB 데이터 →

B-Tree: 1GB + index. 1.5x.
LSM:    1GB + 압축 + tombstone + 옛 version. 1.1-2x (compaction 정도).
```

### 적합 use case
```
B-Tree:
- OLTP (random read + update + delete)
- 일관된 read latency
- Range query 자주
- Postgres / MySQL / SQLite

LSM:
- Write-heavy (시계열, log)
- 빠른 ingestion
- Range scan 도 OK
- Cassandra / RocksDB / LevelDB / DynamoDB / ScyllaDB
```

### Hybrid
```
Postgres + Heap + WAL: B-Tree 그러나 log-structured 측면.
ZFS / Btrfs: copy-on-write file system — LSM 같은 측면.
```

### 튜닝 — Postgres B-Tree
```sql
-- Page fill factor (UPDATE-heavy)
ALTER TABLE x SET (fillfactor = 80);

-- Index fillfactor
CREATE INDEX ON x (col) WITH (fillfactor = 90);

-- Vacuum 자주 (bloat 방지)
ALTER TABLE x SET (autovacuum_vacuum_scale_factor = 0.05);
```

### 튜닝 — RocksDB LSM
```
write_buffer_size:           Memtable 크기
max_write_buffer_number:     동시 memtable
level0_file_num_compaction_trigger
target_file_size_base:       SSTable 크기
compression_per_level:       각 level 의 압축
bloom_filter_bits_per_key:   read 가속
```

### 사용 라이브러리 — Node
```ts
// LevelDB / RocksDB
import { Level } from 'level';
const db = new Level('./db', { valueEncoding: 'json' });
await db.put('key', { value: 42 });
const v = await db.get('key');

// Range
for await (const [k, v] of db.iterator({ gte: 'a', lte: 'z' })) {
  console.log(k, v);
}
```

### Sorted vs unsorted
```
B-Tree:  내장 sorted (by key).
LSM:     sorted (by key) — range scan OK.
Hash:    unsorted (no range, only point lookup) — Memcached, hash index.
```

### Cache hierarchy
```
RAM (page cache / memtable) → SSD (data) → 옛 SSD / HDD (cold).

Postgres shared_buffers: 25% RAM 권장.
RocksDB block_cache: workload 따라.
```

### 알고리즘 visualization
```
B-Tree insertion:
1. Find leaf
2. If full → split, push median up
3. Recursive up

LSM compaction:
1. L0 file count > threshold → merge into L1
2. L1 size > target → merge oldest into L2
...
```

### Modern 변형
```
Fractal Tree: B-Tree + log buffer (TokuDB).
Bw-Tree:      lock-free B-Tree 변형 (Hekaton, Microsoft).
Adaptive Radix Tree (ART): 메모리 DB.
LSM with bloom filters per level.
```

## 🤔 의사결정 기준
| Workload | Engine |
|---|---|
| OLTP (banking, orders) | B-Tree (Postgres / InnoDB) |
| Time-series / logs | LSM (Cassandra / TimescaleDB) |
| Write-heavy + range | LSM (RocksDB) |
| Mostly read | B-Tree |
| Embedded | LevelDB / SQLite (B-Tree) |
| Distributed write | LSM (Cassandra / ScyllaDB) |

## ❌ 안티패턴
- **B-Tree 큰 random insert**: page split 폭발. UUID v7.
- **LSM short value frequent overwrite**: write amp 큼. 다른 storage.
- **Compaction off LSM**: read amp 폭발.
- **Vacuum off B-Tree**: bloat.
- **Bloom filter off LSM**: read 매번 모든 SSTable.
- **Cache size 무시**: 디스크 hit 자주.
- **B-Tree 가정 + LSM DB 사용**: trade-off 모름.

## 🤖 LLM 활용 힌트
- Postgres / MySQL = B-Tree (대부분 case).
- Cassandra / RocksDB = LSM (write-heavy).
- 알고 쓰면 튜닝 정확.

## 🔗 관련 문서
- [[DB_Index_Strategy]]
- [[DB_Vacuum_Autovacuum]]
- [[DB_Time_Series_Patterns]]