Files
2nd/10_Wiki/Topics/Coding/CS_Hashing_Strategies.md
T
2026-05-09 22:47:42 +09:00

477 lines
11 KiB
Markdown

---
id: cs-hashing-strategies
title: Hashing Strategies — MD5 / SHA / xxHash / Argon2
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [cs, hashing, vibe-coding]
tech_stack: { language: "TS", applicable_to: ["Backend"] }
applied_in: []
aliases: [hash, MD5, SHA-256, xxHash, Argon2, password hash, content addressing]
---
# Hashing Strategies
> 다양 use case 의 다양 hash. **Cryptographic (SHA-256, BLAKE3) vs Fast (xxHash, MurmurHash) vs Password (Argon2, bcrypt)**. 잘못 선택 = 보안 / 성능 망가짐.
## 📖 핵심 개념
- Cryptographic: collision-resistant, slow.
- Fast non-crypto: speed-optimized.
- Password: deliberately slow (brute force 차단).
- Content-addressed: data = id (Git, IPFS).
## 💻 코드 패턴
### Use case 별 추천
```
Password hash: Argon2id, bcrypt, scrypt
Content address: SHA-256, BLAKE3
Tamper detection: SHA-256, HMAC
Cache key / sharding: xxHash, MurmurHash
File integrity: SHA-256, BLAKE3
HMAC (signing): HMAC-SHA-256
ID generation: UUID, Snowflake
```
### Cryptographic hash (slow, secure)
```ts
import { createHash } from 'node:crypto';
const hash = createHash('sha256').update('hello').digest('hex');
// 'sha256' / 'sha512' / 'sha3-256' / 'blake2b512'
// File hash
import { createReadStream } from 'node:fs';
async function hashFile(path: string): Promise<string> {
return new Promise((resolve, reject) => {
const hash = createHash('sha256');
const stream = createReadStream(path);
stream.on('data', (chunk) => hash.update(chunk));
stream.on('end', () => resolve(hash.digest('hex')));
stream.on('error', reject);
});
}
```
### BLAKE3 (modern, faster than SHA-256)
```bash
yarn add blake3
```
```ts
import { hash } from 'blake3';
const result = hash('hello').toString('hex');
```
→ SHA-256 보다 5-10x 빠름. Same security.
### xxHash (very fast, non-crypto)
```bash
yarn add xxhash-wasm
```
```ts
import xxhash from 'xxhash-wasm';
const { h64ToString, h32 } = await xxhash();
const hash = h64ToString('hello'); // 'cbb195b6c87b8e44'
// 또는 number
const num = h32('hello');
```
→ 10 GB/s+. Cache key, sharding, 짐 검사 (non-secure).
### MurmurHash (fast, popular)
```ts
import murmurhash from 'murmurhash';
const hash = murmurhash.v3('hello'); // 32-bit number
```
→ Java HashMap, Cassandra 사용.
### Password hashing (Argon2)
```bash
yarn add argon2
```
```ts
import argon2 from 'argon2';
const hash = await argon2.hash('password', {
type: argon2.argon2id,
memoryCost: 65536, // 64 MB
timeCost: 3,
parallelism: 4,
});
// '$argon2id$v=19$m=65536,t=3,p=4$...'
const valid = await argon2.verify(hash, 'password');
```
→ Memory-hard. GPU brute force 차단.
### bcrypt (legacy but OK)
```ts
import bcrypt from 'bcrypt';
const hash = await bcrypt.hash('password', 12); // cost 12
const valid = await bcrypt.compare('password', hash);
```
→ 1999 부터. Stable. Argon2 보다 약함 — 새 = Argon2.
### Password hash 의 cost
```
Argon2id (defaults):
- 64 MB memory
- 3 iterations
- ~100ms verify
→ Login 매번 100ms — OK.
Brute force = 매우 느림.
```
### HMAC (signed message)
```ts
import { createHmac } from 'node:crypto';
const sig = createHmac('sha256', secret).update(message).digest('hex');
// Verify
function verify(msg: string, sig: string, secret: string): boolean {
const expected = createHmac('sha256', secret).update(msg).digest('hex');
return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected));
}
```
→ Webhook signature, JWT, API auth.
→ [[Backend_Webhook_Patterns]].
### Content-addressed (Git, IPFS)
```ts
// Git: SHA-1 (legacy → SHA-256 future)
const blobHash = createHash('sha1').update('blob 11\0hello world').digest('hex');
// IPFS: 다양 (default = SHA-256)
import { CID } from 'multiformats/cid';
import { sha256 } from 'multiformats/hashes/sha2';
const hash = await sha256.digest(new TextEncoder().encode('hello'));
const cid = CID.create(1, 0x55, hash); // 0x55 = raw codec
```
→ Same content = same hash. Dedup.
### Hash for cache key
```ts
// 긴 string / object → cache key
function cacheKey(req: Request): string {
const key = JSON.stringify({ url: req.url, body: req.body });
return xxhash.h64ToString(key); // 16 char
}
await redis.set(`cache:${cacheKey(req)}`, response);
```
→ xxHash = 빠름. SHA = overkill.
### Hash for sharding (consistent)
```ts
function shardKey(userId: string, numShards: number): number {
return xxhash.h32(userId) % numShards;
}
```
→ [[CS_Consistent_Hashing]] (better — re-shard 시 작은 이동).
### Hash table (HashMap)
```
JS Map / Object 가 HashMap.
Default hash 가 V8 internal.
→ 직접 implement 필요 X.
```
### MD5 (deprecated for security)
```
MD5: collision found (2004).
SHA-1: collision found (2017).
Use:
- Non-security checksum: MD5 / SHA-1 OK
- Security: SHA-256 / SHA-3 / BLAKE3
```
### SHA-1 vs SHA-256 vs SHA-3
```
SHA-1: deprecated (security)
SHA-256: 표준
SHA-512: 64-bit native (faster on 64-bit CPU)
SHA-3: Keccak (different family)
BLAKE3: faster than all
```
### Salt (password)
```ts
// ❌ Same password → same hash
hash('password')
// ✅ Salt
hash(salt + password)
// Salt 가 unique per user.
// Argon2 / bcrypt 자동 salt.
```
### Pepper
```ts
const pepper = process.env.PEPPER!; // server-side secret
const hash = argon2.hash(password + pepper, ...);
```
→ Salt = DB 안. Pepper = env var. DB leak 시 추가 protection.
### Timing attack
```ts
// ❌
if (sig === expected) ... // string compare timing
// ✅
import { timingSafeEqual } from 'node:crypto';
if (timingSafeEqual(Buffer.from(sig), Buffer.from(expected))) ...
```
### Password upgrade (rehash)
```ts
async function login(email: string, password: string) {
const user = await db.users.findByEmail(email);
if (!await argon2.verify(user.passwordHash, password)) {
throw new Error('Invalid');
}
// Upgrade hash if cost 옛
if (argon2.needsRehash(user.passwordHash, { ...currentParams })) {
const newHash = await argon2.hash(password, currentParams);
await db.users.update(user.id, { passwordHash: newHash });
}
return createSession(user);
}
```
→ 시간 지나며 cost 증가.
### Hash chain (Merkle tree)
```ts
// Block hash:
hash(prev_block_hash + transaction_data)
// Tamper one block → 모든 후속 block invalid.
// Bitcoin / Ethereum.
```
### Merkle tree
```
[hash root]
/ \
[hash A] [hash B]
/ \ / \
[h1] [h2] [h3] [h4]
| | | |
[d1] [d2] [d3] [d4]
```
→ Verify d2 = h2 + (h3+h4 hash) → root. log(N) proof.
→ Git, IPFS, blockchain.
### Bloom filter (probabilistic)
```ts
import xxhash from 'xxhash-wasm';
const xh = await xxhash();
const bf = new Uint8Array(M); // M bits
function add(key: string) {
for (let i = 0; i < K; i++) {
const idx = xh.h32(key + i) % (M * 8);
bf[idx >> 3] |= (1 << (idx & 7));
}
}
function maybe(key: string): boolean {
for (let i = 0; i < K; i++) {
const idx = xh.h32(key + i) % (M * 8);
if (!(bf[idx >> 3] & (1 << (idx & 7)))) return false;
}
return true; // probably
}
```
→ [[CS_Bloom_Filter]].
### Hash collision
```
Cryptographic (SHA-256): 2^128 trial 가 평균. 안 발생.
Non-crypto (xxHash 64): 2^32 trial 가 50% (birthday paradox).
- 100 K items: 안 발생.
- 1 B items: 가능.
→ Critical = SHA-256. 작은 = xxHash OK.
```
### Comparison table
```
Algorithm Speed Security Use case
MD5 Fast Broken Legacy checksum
SHA-1 Fast Broken Git (legacy)
SHA-256 Medium Strong Default crypto
SHA-3 Medium Strong New crypto
BLAKE3 Fast Strong Modern crypto
xxHash Very fast None Cache, shard
MurmurHash Very fast None Cache, shard
FNV Very fast None Cache (작은)
HMAC-SHA256 Medium Strong Sign / verify
Argon2id Slow Strong Password
bcrypt Slow Strong Password
scrypt Slow Strong Password (memory-hard)
```
### Performance (대략)
```
SHA-256: 500 MB/s (1 thread)
SHA-3: 400 MB/s
BLAKE3: 3 GB/s (multi-thread)
xxHash: 5-10 GB/s
MurmurHash: 5 GB/s
Argon2id: ~100ms / verify (intentionally)
bcrypt cost 12: ~250ms
```
### Hash + ID
```ts
// Content-addressable storage
const id = createHash('sha256').update(content).digest('hex');
await s3.put(`/objects/${id}`, content);
// 같은 content = 같은 id (dedup).
```
### Snowflake / UUID + hash (composite)
```
Snowflake: time + machine + seq.
UUID v7: time + random.
ID 자체 가 hash X.
But:
hash(snowflake_id) → consistent shard key.
```
### Hash-based deduplication
```ts
// File dedup
async function dedupe(file: Buffer) {
const hash = sha256(file);
if (await db.files.exists(hash)) return hash; // already
await db.files.put(hash, file);
return hash;
}
```
→ Same file = 1 copy.
### Ethereum-style hash
```
keccak-256 (= SHA-3 의 변형, but Ethereum 가 fixed SHA-3 전 use).
```
### Common mistakes
```
- MD5 for password: broken.
- SHA-256 for password: 너무 빠름 (brute force).
- Plain text password store: 절대.
- Salt 무: rainbow table.
- Same hash function 모든 use case: wrong tool.
- timingSafeEqual 무 (signature compare): timing attack.
```
### When to use what
```
DB password column: Argon2id hash.
Session ID: cryptographically random (not hash).
File integrity: SHA-256.
Git-like CAS: BLAKE3 (modern) / SHA-256.
Cache key: xxHash.
Webhook signature: HMAC-SHA256.
JWT signing: HMAC-SHA256 또는 RS256.
URL-safe ID: base64url(random) 또는 NanoID.
```
### Library
```ts
// Node built-in
import { createHash, createHmac, randomBytes } from 'node:crypto';
// Modern
import { hash as blake3 } from 'blake3';
import argon2 from 'argon2';
import xxhash from 'xxhash-wasm';
// Web Crypto (browser + edge)
const buffer = await crypto.subtle.digest('SHA-256', encoder.encode(text));
const hex = Array.from(new Uint8Array(buffer)).map(b => b.toString(16).padStart(2, '0')).join('');
```
### Web Crypto (edge / browser)
```ts
async function sha256(text: string): Promise<string> {
const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(text));
return Array.from(new Uint8Array(buf))
.map(b => b.toString(16).padStart(2, '0'))
.join('');
}
```
→ Cloudflare Workers / Deno / Bun 호환.
## 🤔 의사결정 기준
| 사용 | 추천 |
|---|---|
| Password | Argon2id |
| File integrity | SHA-256 / BLAKE3 |
| Cache key | xxHash |
| Webhook sig | HMAC-SHA256 |
| Random ID | randomBytes (not hash) |
| Sharding | xxHash + consistent hashing |
| Git-like | SHA-256 |
| Tamper-evident | Merkle + SHA-256 |
## ❌ 안티패턴
- **Password 가 SHA-256**: brute force.
- **MD5 prod**: broken.
- **No salt**: rainbow table.
- **timingSafeEqual 무 + sig compare**: timing.
- **Hash 가 ID 의 only**: collision risk (xxHash large scale).
- **너무 비싼 hash + non-security**: latency.
- **Web Crypto 가 edge 안 알기**: error.
## 🤖 LLM 활용 힌트
- Use case 따라 정확 hash.
- Argon2id = password.
- SHA-256 = secure default.
- xxHash = speed.
- timingSafeEqual = compare.
## 🔗 관련 문서
- [[CS_Bloom_Filter]]
- [[CS_Consistent_Hashing]]
- [[Security_OWASP_Top_10_Practical]]