f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.5 KiB
8.5 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-hash-functions-and-maps | Hash Functions and Maps | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
Hash Functions and Maps
매 한 줄
"매 key → bucket index 의 mapping 을 통해 average O(1) lookup/insert 의 data structure". 1953년 IBM 의 Hans Peter Luhn 의 origin — 매 modern Rust HashMap (SipHash), Google SwissTable / Abseil flat_hash_map, Python dict (open addressing + perturbation) 의 모두 derivative. 매 cryptographic hash (SHA-256/3, BLAKE3) 와 non-crypto hash (xxHash, ahash, FxHash) 의 distinction.
매 핵심
매 Hash function properties
- Determinism: same input → same output.
- Uniformity: 매 output 의 uniform distribution.
- Avalanche: 매 single-bit input change 의 ~50% output bits 의 flip.
- Speed (non-crypto): 매 ahash/xxHash 의 GB/s.
- Collision resistance (crypto): 매 finding x≠y, h(x)=h(y) 의 infeasible.
매 Hash table strategies
- Separate chaining: 매 bucket 의 linked list/tree (Java HashMap 의 default since 8 — list→tree at 8).
- Open addressing: 매 collision 시 alternative slot probe.
- Linear probing: 매 +1, +2, ... (cache-friendly but clustering).
- Quadratic probing: 매 +1², +2², ...
- Double hashing: 매 +h_2(k), +2h_2(k), ...
- Robin Hood hashing: 매 displacement 의 minimization (Rust hashbrown 의 historical).
- SwissTable (2017+, Google): 매 SIMD-based metadata + open addressing — 매 modern fastest.
매 Load factor & resizing
- 매 load factor α = n/m. Open addressing 의 α < 0.75 권장, chaining 의 α < 1 권장.
- 매 resize: 매 doubling (m → 2m) + rehash all keys.
- Amortized O(1) insert.
매 응용
- Symbol table (compiler).
- Cache (LRU, LFU).
- Set membership (HashSet).
- Counting (frequency).
- Dedup.
- Database index (hash join, hash partition).
💻 패턴
Rust — 매 modern HashMap
use std::collections::HashMap;
fn main() {
// Default: SipHash-1-3 (DoS-resistant but slower).
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("alice".to_string(), 30);
map.insert("bob".to_string(), 25);
// 매 ergonomic API
*map.entry("alice".to_string()).or_insert(0) += 1;
if let Some(age) = map.get("alice") {
println!("Alice: {}", age);
}
// 매 iterator
for (k, v) in &map {
println!("{} = {}", k, v);
}
}
Rust — ahash (매 fastest non-crypto, DoS resistant)
// Cargo.toml: ahash = "0.8"
use ahash::AHashMap;
fn main() {
let mut map: AHashMap<&str, i32> = AHashMap::new();
map.insert("hello", 1);
// 매 SipHash 보다 ~5x 빠름, AES-NI 사용 시 더 빠름.
// 매 production 의 default 권장 (workload 에 따라).
}
Rust — FxHash (매 known-key 의 ultra-fast)
// Cargo.toml: rustc-hash = "1.1"
use rustc_hash::FxHashMap;
fn main() {
let mut map: FxHashMap<u64, &str> = FxHashMap::default();
map.insert(42, "answer");
// 매 rustc 내부 사용. 매 NOT DoS-resistant — 매 untrusted input 시 SipHash/aHash 사용.
}
Custom Hash (매 Rust trait)
use std::hash::{Hash, Hasher};
use std::collections::HashMap;
#[derive(PartialEq, Eq)]
struct Point { x: i32, y: i32 }
impl Hash for Point {
fn hash<H: Hasher>(&self, state: &mut H) {
// 매 combine fields. 매 Default impl 보다 careful 필요 시 직접.
self.x.hash(state);
self.y.hash(state);
}
}
C++ — 매 std::unordered_map vs absl::flat_hash_map
#include <absl/container/flat_hash_map.h>
#include <string>
int main() {
// std::unordered_map: 매 chaining, slow due to pointer chasing
// absl::flat_hash_map: 매 SwissTable, ~2-3x faster
absl::flat_hash_map<std::string, int> map;
map["alice"] = 30;
map["bob"] = 25;
if (auto it = map.find("alice"); it != map.end()) {
std::cout << it->second << "\n";
}
}
Open Addressing (매 simple Linear Probing)
class LinearProbingHashMap:
def __init__(self, capacity=16):
self.capacity = capacity
self.size = 0
self.keys = [None] * capacity
self.values = [None] * capacity
def _probe(self, key):
idx = hash(key) % self.capacity
while self.keys[idx] is not None and self.keys[idx] != key:
idx = (idx + 1) % self.capacity
return idx
def put(self, key, value):
if self.size >= self.capacity * 0.75:
self._resize()
idx = self._probe(key)
if self.keys[idx] is None:
self.size += 1
self.keys[idx] = key
self.values[idx] = value
def get(self, key):
idx = self._probe(key)
return self.values[idx] if self.keys[idx] is not None else None
def _resize(self):
old_keys, old_values = self.keys, self.values
self.capacity *= 2
self.keys = [None] * self.capacity
self.values = [None] * self.capacity
self.size = 0
for k, v in zip(old_keys, old_values):
if k is not None:
self.put(k, v)
Cryptographic hash (매 SHA-256, BLAKE3)
use sha2::{Sha256, Digest};
use blake3;
fn main() {
// 매 SHA-256: 매 widely supported but slow (~600 MB/s).
let mut hasher = Sha256::new();
hasher.update(b"hello world");
let result = hasher.finalize();
println!("{:x}", result);
// 매 BLAKE3: 매 modern fastest crypto hash (~6 GB/s with SIMD).
let hash = blake3::hash(b"hello world");
println!("{}", hash);
}
Bloom Filter (매 hash-based set, false-positive OK)
import mmh3 # MurmurHash3
from bitarray import bitarray
class BloomFilter:
def __init__(self, size, num_hashes):
self.size = size
self.num_hashes = num_hashes
self.bits = bitarray(size)
self.bits.setall(0)
def add(self, item):
for i in range(self.num_hashes):
idx = mmh3.hash(item, i) % self.size
self.bits[idx] = 1
def contains(self, item):
return all(self.bits[mmh3.hash(item, i) % self.size] for i in range(self.num_hashes))
bf = BloomFilter(size=10000, num_hashes=7)
bf.add("alice")
print(bf.contains("alice")) # True (definitely)
print(bf.contains("bob")) # False (definitely) or True (false-positive)
매 결정 기준
| 상황 | Hash function / Map |
|---|---|
| Rust trusted input, max speed | FxHash |
| Rust untrusted input | std HashMap (SipHash) or aHash |
| C++ general | absl::flat_hash_map (SwissTable) |
| Python | dict (built-in, optimized) |
| Distributed cache key | xxHash3 / FNV-1a |
| Cryptographic | BLAKE3 (speed) / SHA-3 (NIST) |
| Bloom filter | MurmurHash3 |
| String interning | weak hash + linear probe |
| Ordered iteration | BTreeMap (not hash) |
기본값: Rust 매 std::collections::HashMap, C++ 매 absl::flat_hash_map, Python 매 dict. 매 performance-critical 시 ahash/FxHash 으로 교체.
🔗 Graph
- 변형: Bloom-Filter · HyperLogLog · Consistent-Hashing
- Adjacent: SHA-256 · xxHash
🤖 LLM 활용
언제: 매 hash function 선택 의 advice, 매 hash table 의 implementation 의 review, 매 collision 의 root cause 의 analysis. 언제 X: 매 cryptographic hash 의 직접 implement — 매 audited library 사용. 매 production hash function 의 직접 작성.
❌ 안티패턴
- String hashing 없이 length 만 사용: 매 catastrophic collision.
- Untrusted input 의 FxHash: 매 HashDoS attack 가능 — SipHash/aHash 사용.
- MD5/SHA-1 신규 사용: 매 broken — BLAKE3/SHA-256 사용.
- Hash 의 modular reduction 의 비균등: 매 power-of-2 size + bitmask 또는 fastrange 사용.
- High load factor 의 open addressing: 매 α > 0.9 의 catastrophic — resize.
- Complex key 의 default hash: 매 distribution 안 좋을 수 있음 — custom impl.
🧪 검증 / 중복
- Verified (Knuth TAOCP Vol 3, "Designing a fast, efficient, cache-friendly hash table", Abseil docs).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Rust/C++/Python implementations, SwissTable, ahash, BLAKE3 |