Files
2nd/10_Wiki/Topics/Computer_Science_and_Theory/Hash-Functions-and-Maps.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

8.5 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-hash-functions-and-maps Hash Functions and Maps 10_Wiki/Topics verified self
Hash Tables
Hash Maps
Dictionaries
HashMap
none A 0.9 applied
data-structures
algorithms
hashing
hash-tables
performance
2026-05-10 pending
language framework
Rust std::collections + ahash + FxHash

Hash Functions and Maps

매 한 줄

"매 key → bucket index 의 mapping 을 통해 average O(1) lookup/insert 의 data structure". 1953년 IBM 의 Hans Peter Luhn 의 origin — 매 modern Rust HashMap (SipHash), Google SwissTable / Abseil flat_hash_map, Python dict (open addressing + perturbation) 의 모두 derivative. 매 cryptographic hash (SHA-256/3, BLAKE3) 와 non-crypto hash (xxHash, ahash, FxHash) 의 distinction.

매 핵심

매 Hash function properties

  • Determinism: same input → same output.
  • Uniformity: 매 output 의 uniform distribution.
  • Avalanche: 매 single-bit input change 의 ~50% output bits 의 flip.
  • Speed (non-crypto): 매 ahash/xxHash 의 GB/s.
  • Collision resistance (crypto): 매 finding x≠y, h(x)=h(y) 의 infeasible.

매 Hash table strategies

  • Separate chaining: 매 bucket 의 linked list/tree (Java HashMap 의 default since 8 — list→tree at 8).
  • Open addressing: 매 collision 시 alternative slot probe.
    • Linear probing: 매 +1, +2, ... (cache-friendly but clustering).
    • Quadratic probing: 매 +1², +2², ...
    • Double hashing: 매 +h_2(k), +2h_2(k), ...
  • Robin Hood hashing: 매 displacement 의 minimization (Rust hashbrown 의 historical).
  • SwissTable (2017+, Google): 매 SIMD-based metadata + open addressing — 매 modern fastest.

매 Load factor & resizing

  • 매 load factor α = n/m. Open addressing 의 α < 0.75 권장, chaining 의 α < 1 권장.
  • 매 resize: 매 doubling (m → 2m) + rehash all keys.
  • Amortized O(1) insert.

매 응용

  1. Symbol table (compiler).
  2. Cache (LRU, LFU).
  3. Set membership (HashSet).
  4. Counting (frequency).
  5. Dedup.
  6. Database index (hash join, hash partition).

💻 패턴

Rust — 매 modern HashMap

use std::collections::HashMap;

fn main() {
    // Default: SipHash-1-3 (DoS-resistant but slower).
    let mut map: HashMap<String, i32> = HashMap::new();
    map.insert("alice".to_string(), 30);
    map.insert("bob".to_string(), 25);
    
    // 매 ergonomic API
    *map.entry("alice".to_string()).or_insert(0) += 1;
    
    if let Some(age) = map.get("alice") {
        println!("Alice: {}", age);
    }
    
    // 매 iterator
    for (k, v) in &map {
        println!("{} = {}", k, v);
    }
}

Rust — ahash (매 fastest non-crypto, DoS resistant)

// Cargo.toml: ahash = "0.8"
use ahash::AHashMap;

fn main() {
    let mut map: AHashMap<&str, i32> = AHashMap::new();
    map.insert("hello", 1);
    // 매 SipHash 보다 ~5x 빠름, AES-NI 사용 시 더 빠름.
    // 매 production 의 default 권장 (workload 에 따라).
}

Rust — FxHash (매 known-key 의 ultra-fast)

// Cargo.toml: rustc-hash = "1.1"
use rustc_hash::FxHashMap;

fn main() {
    let mut map: FxHashMap<u64, &str> = FxHashMap::default();
    map.insert(42, "answer");
    // 매 rustc 내부 사용. 매 NOT DoS-resistant — 매 untrusted input 시 SipHash/aHash 사용.
}

Custom Hash (매 Rust trait)

use std::hash::{Hash, Hasher};
use std::collections::HashMap;

#[derive(PartialEq, Eq)]
struct Point { x: i32, y: i32 }

impl Hash for Point {
    fn hash<H: Hasher>(&self, state: &mut H) {
        // 매 combine fields. 매 Default impl 보다 careful 필요 시 직접.
        self.x.hash(state);
        self.y.hash(state);
    }
}

C++ — 매 std::unordered_map vs absl::flat_hash_map

#include <absl/container/flat_hash_map.h>
#include <string>

int main() {
    // std::unordered_map: 매 chaining, slow due to pointer chasing
    // absl::flat_hash_map: 매 SwissTable, ~2-3x faster
    absl::flat_hash_map<std::string, int> map;
    map["alice"] = 30;
    map["bob"] = 25;
    
    if (auto it = map.find("alice"); it != map.end()) {
        std::cout << it->second << "\n";
    }
}

Open Addressing (매 simple Linear Probing)

class LinearProbingHashMap:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.size = 0
        self.keys = [None] * capacity
        self.values = [None] * capacity
    
    def _probe(self, key):
        idx = hash(key) % self.capacity
        while self.keys[idx] is not None and self.keys[idx] != key:
            idx = (idx + 1) % self.capacity
        return idx
    
    def put(self, key, value):
        if self.size >= self.capacity * 0.75:
            self._resize()
        idx = self._probe(key)
        if self.keys[idx] is None:
            self.size += 1
        self.keys[idx] = key
        self.values[idx] = value
    
    def get(self, key):
        idx = self._probe(key)
        return self.values[idx] if self.keys[idx] is not None else None
    
    def _resize(self):
        old_keys, old_values = self.keys, self.values
        self.capacity *= 2
        self.keys = [None] * self.capacity
        self.values = [None] * self.capacity
        self.size = 0
        for k, v in zip(old_keys, old_values):
            if k is not None:
                self.put(k, v)

Cryptographic hash (매 SHA-256, BLAKE3)

use sha2::{Sha256, Digest};
use blake3;

fn main() {
    // 매 SHA-256: 매 widely supported but slow (~600 MB/s).
    let mut hasher = Sha256::new();
    hasher.update(b"hello world");
    let result = hasher.finalize();
    println!("{:x}", result);
    
    // 매 BLAKE3: 매 modern fastest crypto hash (~6 GB/s with SIMD).
    let hash = blake3::hash(b"hello world");
    println!("{}", hash);
}

Bloom Filter (매 hash-based set, false-positive OK)

import mmh3  # MurmurHash3
from bitarray import bitarray

class BloomFilter:
    def __init__(self, size, num_hashes):
        self.size = size
        self.num_hashes = num_hashes
        self.bits = bitarray(size)
        self.bits.setall(0)
    
    def add(self, item):
        for i in range(self.num_hashes):
            idx = mmh3.hash(item, i) % self.size
            self.bits[idx] = 1
    
    def contains(self, item):
        return all(self.bits[mmh3.hash(item, i) % self.size] for i in range(self.num_hashes))

bf = BloomFilter(size=10000, num_hashes=7)
bf.add("alice")
print(bf.contains("alice"))   # True (definitely)
print(bf.contains("bob"))     # False (definitely) or True (false-positive)

매 결정 기준

상황 Hash function / Map
Rust trusted input, max speed FxHash
Rust untrusted input std HashMap (SipHash) or aHash
C++ general absl::flat_hash_map (SwissTable)
Python dict (built-in, optimized)
Distributed cache key xxHash3 / FNV-1a
Cryptographic BLAKE3 (speed) / SHA-3 (NIST)
Bloom filter MurmurHash3
String interning weak hash + linear probe
Ordered iteration BTreeMap (not hash)

기본값: Rust 매 std::collections::HashMap, C++ 매 absl::flat_hash_map, Python 매 dict. 매 performance-critical 시 ahash/FxHash 으로 교체.

🔗 Graph

🤖 LLM 활용

언제: 매 hash function 선택 의 advice, 매 hash table 의 implementation 의 review, 매 collision 의 root cause 의 analysis. 언제 X: 매 cryptographic hash 의 직접 implement — 매 audited library 사용. 매 production hash function 의 직접 작성.

안티패턴

  • String hashing 없이 length 만 사용: 매 catastrophic collision.
  • Untrusted input 의 FxHash: 매 HashDoS attack 가능 — SipHash/aHash 사용.
  • MD5/SHA-1 신규 사용: 매 broken — BLAKE3/SHA-256 사용.
  • Hash 의 modular reduction 의 비균등: 매 power-of-2 size + bitmask 또는 fastrange 사용.
  • High load factor 의 open addressing: 매 α > 0.9 의 catastrophic — resize.
  • Complex key 의 default hash: 매 distribution 안 좋을 수 있음 — custom impl.

🧪 검증 / 중복

  • Verified (Knuth TAOCP Vol 3, "Designing a fast, efficient, cache-friendly hash table", Abseil docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — Rust/C++/Python implementations, SwissTable, ahash, BLAKE3