--- id: wiki-2026-0508-b-tree title: B-Tree category: 10_Wiki/Topics status: verified canonical_id: self aliases: [B+Tree, BTree, Balanced-Tree-Index] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [data-structure, tree, index, database] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: stdlib --- # B-Tree ## 매 한 줄 > **"매 disk-friendly한 self-balancing search tree — 매 한 노드에 매 많은 key를 저장해 매 height를 minimize"**. 매 1970년 Bayer & McCreight가 IBM에서 design, 매 2026 PostgreSQL/MySQL InnoDB/SQLite의 default index, 매 NVMe SSD에서도 여전히 dominant — 매 sequential I/O와 cache line alignment 친화적. ## 매 핵심 ### 매 invariant - 매 node는 매 $[t-1, 2t-1]$ keys 보유 (매 root만 예외). - 매 internal node는 매 $[t, 2t]$ children. - 매 모든 leaf는 매 same depth. - 매 keys 매 sorted within node. ### 매 B+ Tree variant (DB 표준) - 매 internal node는 매 keys만 — 매 data는 매 leaf에만. - 매 leaf끼리 매 linked list — 매 range scan $O(k)$. - 매 PostgreSQL/MySQL이 매 사용. ### 매 응용 1. RDBMS index (PostgreSQL btree). 2. Filesystem (ext4 HTree, NTFS). 3. KV store (LevelDB SST, RocksDB). 4. Vector DB metadata index. ## 💻 패턴 ### B-Tree node (Python) ```python class BTreeNode: def __init__(self, t, leaf=False): self.t = t # min degree self.keys = [] self.children = [] self.leaf = leaf def search(self, k): i = 0 while i < len(self.keys) and k > self.keys[i]: i += 1 if i < len(self.keys) and self.keys[i] == k: return (self, i) if self.leaf: return None return self.children[i].search(k) ``` ### Insert with split ```python def split_child(parent, i): t = parent.t full = parent.children[i] new = BTreeNode(t, full.leaf) new.keys = full.keys[t:] if not full.leaf: new.children = full.children[t:] full.children = full.children[:t] parent.keys.insert(i, full.keys[t-1]) full.keys = full.keys[:t-1] parent.children.insert(i+1, new) def insert(root, k): if len(root.keys) == 2*root.t - 1: new_root = BTreeNode(root.t) new_root.children.append(root) split_child(new_root, 0) root = new_root insert_nonfull(root, k) return root ``` ### B+ Tree range scan ```python def range_scan(leaf, lo, hi): out = [] node = leaf while node: for k in node.keys: if lo <= k <= hi: out.append(k) elif k > hi: return out node = node.next # leaf-linked list return out ``` ### PostgreSQL B-Tree usage ```sql CREATE INDEX idx_users_email ON users USING btree (email); -- equality + range + sort 사용 EXPLAIN SELECT * FROM users WHERE email > 'a' AND email < 'm'; -- Index Scan using idx_users_email ``` ### SQLite WITHOUT ROWID (B-Tree direct) ```sql CREATE TABLE kv (k TEXT PRIMARY KEY, v BLOB) WITHOUT ROWID; -- 매 data가 매 PK index 자체에 — 매 secondary lookup 제거 ``` ### Bulk loading (sorted insert) ```python def bulk_load(sorted_pairs, t=64): # Sort + bottom-up build (vs O(n log n) per-insert) leaves = [sorted_pairs[i:i+2*t-1] for i in range(0, len(sorted_pairs), 2*t-1)] # build internal levels... ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | OLTP, point + range query | B+ Tree (default) | | Append-heavy timeseries | LSM (RocksDB) — B-Tree write amp 高 | | In-memory only, no range | Hash index | | Vector similarity | HNSW (not B-Tree) | | Spatial | R-Tree / GiST | **기본값**: 매 RDBMS index는 매 B+ Tree. 매 SSD/NVMe에서도 매 page-aligned (8KB-16KB) 노드. ## 🔗 Graph - 부모: [[Linked-Lists-and-Trees]] · [[Theoretical-Computer-Science]] - 변형: [[Hash-Functions-and-Maps]] (alternative) - 응용: Database-Index · [[Bloom-Filters in Search]] - Adjacent: [[Algorithm-Complexity-Big-O]] · LSM-Tree ## 🤖 LLM 활용 **언제**: 매 DB schema design, 매 "왜 query가 slow?" debugging, 매 index choice review. **언제 X**: 매 in-memory + write-heavy → LSM 우선; 매 vector search → HNSW. ## ❌ 안티패턴 - **Random UUID v4 PK**: 매 B-Tree에 매 random insert → 매 page split storm. 매 UUIDv7 (time-ordered) 사용. - **Over-indexing**: 매 모든 column에 index — 매 write amp + storage 폭증. - **Index on low-cardinality**: 매 boolean column index — 매 useless, full scan 더 빠름. ## 🧪 검증 / 중복 - Verified (Bayer 1972 original paper, PostgreSQL docs Ch.62). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — B-Tree/B+Tree, split logic, DB index |