Files
2nd/10_Wiki/Topics/Coding/CS_Compression_Algorithms.md
T
2026-05-09 21:08:02 +09:00

7.0 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
cs-compression-algorithms Compression — gzip / brotli / zstd / lz4 Coding draft B conceptual 2026-05-09 2026-05-09
cs
compression
vibe-coding
language applicable_to
TS / Various
Backend
Frontend
gzip
brotli
zstd
lz4
snappy
deflate
compression ratio

Compression Algorithms

Network / disk 압축. gzip (legacy), brotli (web), zstd (modern), lz4 (속도). Trade-off: ratio vs CPU. Per use case.

📖 핵심 개념

  • Ratio: 작을수록 좋음.
  • Speed: compress / decompress 별.
  • Memory: small footprint.
  • Streaming: 점진 압축.

💻 코드 패턴

비교 (대략)

Algorithm  Ratio    CPU comp    CPU decomp   Use case
gzip       3-5x     middle      fast         legacy web, log
brotli     5-7x     slow-ish    fast         web (HTTP)
zstd       4-6x     fast        very fast    modern default
lz4        2-3x     very fast   very fast    memory cache, snap
snappy     2-3x     very fast   very fast    big data (Cassandra)
xz         5-10x    slow        slow         backup
zlib       3-5x     middle      fast         legacy

Node 사용

import zlib from 'node:zlib';
import { promisify } from 'node:util';

// gzip
const gzip = promisify(zlib.gzip);
const gunzip = promisify(zlib.gunzip);

const compressed = await gzip(Buffer.from('hello'.repeat(1000)));
const decompressed = await gunzip(compressed);

// Brotli
const compressed = await promisify(zlib.brotliCompress)(buf);
const decompressed = await promisify(zlib.brotliDecompress)(compressed);

Streaming

import { createGzip } from 'node:zlib';
import { pipeline } from 'node:stream/promises';
import fs from 'node:fs';

await pipeline(
  fs.createReadStream('input.txt'),
  createGzip({ level: 6 }),
  fs.createWriteStream('output.txt.gz'),
);

zstd (modern, recommend)

yarn add @mongodb-js/zstd  # 또는 node-zstandard
import zstd from '@mongodb-js/zstd';

const compressed = await zstd.compress(buffer, 3);  // level 1-22
const decompressed = await zstd.decompress(compressed);

HTTP — gzip / brotli (자동)

// Express
import compression from 'compression';
app.use(compression({
  level: 6,
  threshold: 1024,        // > 1KB 만
  filter: (req, res) => {
    const t = res.getHeader('Content-Type');
    return /text|json|javascript|css|svg/.test(String(t));
  },
}));
// Hono (modern, brotli + gzip)
import { compress } from 'hono/compress';
app.use(compress({ encoding: 'br' }));  // 또는 gzip

→ 자동 Accept-Encoding 검사 + 적절 algorithm.

nginx

gzip on;
gzip_types text/css application/javascript application/json;
gzip_min_length 1024;
gzip_comp_level 6;

brotli on;
brotli_types text/css application/javascript application/json;
brotli_comp_level 6;

→ Brotli 가 web 표준 (3-5% 더 작음 vs gzip).

Pre-compression (static)

# Build 시 압축 — runtime CPU 안 씀
brotli -k -q 11 dist/*.js dist/*.css   # 최강 압축
gzip -k -9 dist/*.js dist/*.css

# 또는 vite plugin
// vite.config.ts
import compression from 'vite-plugin-compression';
plugins: [
  compression({ algorithm: 'gzip', ext: '.gz' }),
  compression({ algorithm: 'brotliCompress', ext: '.br' }),
];
# Pre-compressed serve
gzip_static on;
brotli_static on;

→ Build 시 1번 압축 + nginx 가 그냥 serve.

압축 가능한 vs 불가능한

Compress 잘 됨:
  Text (JSON, XML, HTML, CSS, JS, log, code)
  
Compress 안 됨:
  Image (JPEG, PNG, WebP — 이미 압축)
  Video (MP4, WebM)
  Audio (MP3, AAC)
  Binary (PDF, archive)
  Random / encrypted
  
→ Image / video 도 압축 시도 = CPU 만 쓰고 더 작아지지도 않음.

Database column (Postgres TOAST)

TEXT / BYTEA > 8KB → 자동 PGLZ 압축.
LZ4 도 옵션 (Postgres 14+).

ALTER TABLE x ALTER COLUMN data SET COMPRESSION lz4;

→ Disk 절약. Query speed 거의 영향 X.

Compression in storage

Parquet:    Snappy (default) / gzip / zstd / brotli
ORC:        Snappy / zlib / lzo
ClickHouse: lz4 / zstd
Cassandra:  Snappy / lz4 / zstd
RocksDB:    Snappy / lz4 / zstd

→ zstd 가 modern best (ratio + speed).

Network — sockets

// WebSocket compression
const ws = new WebSocket(url, { perMessageDeflate: true });

→ 큰 message 자주 = enable.

Brotli vs gzip (web specific)

Brotli static dictionary = HTML / JS / CSS 자주 단어.
같은 size 파일 → brotli 가 5-15% 작음.

→ Modern web = brotli + gzip fallback.

Compression bomb (보안)

1KB compressed → 1GB decompressed.
Server 가 검사 없이 decompress = OOM.

→ Max decompressed size limit.
import { gunzipSync } from 'node:zlib';

const MAX_SIZE = 100 * 1024 * 1024;  // 100MB
const decompressed = gunzipSync(buf, { maxOutputLength: MAX_SIZE });

LZ4 (memory cache)

import LZ4 from 'lz4js';

const compressed = LZ4.compress(buf);
const decompressed = LZ4.decompress(compressed);

→ 매우 빠름 — Redis 가 사용 가능.

Snappy (big data, Hadoop / Cassandra)

  • 매우 빠른 compress / decompress.
  • Ratio 약함 (2-3x).
  • Big data scenarios.

압축 level 결정

gzip / brotli / zstd: 1 (fast) - 9/11/22 (slow + smaller)

Real-time stream: level 1-3
HTTP 응답: 6 (default)
Static asset: 11 (max — pre-build)
Backup: max

측정

const original = data.length;
const t0 = Date.now();
const compressed = await zstd.compress(data, 3);
const t1 = Date.now();
const decompressed = await zstd.decompress(compressed);
const t2 = Date.now();

console.log({
  original,
  compressed: compressed.length,
  ratio: (original / compressed.length).toFixed(2),
  compressMs: t1 - t0,
  decompressMs: t2 - t1,
});

Dictionary compression (큰 절약)

같은 schema JSON 매번 보내면 — 같은 키 반복.
Pre-built dictionary 로 더 작게.

zstd 가 dict mode 지원:
zstd --train *.json -o dict
zstd -D dict input.json

→ 50-80% 더 작아짐 가능.

🤔 의사결정 기준

사용 추천
HTTP 응답 (실시간) brotli (level 4-6) + gzip fallback
Static asset (build) brotli max + gzip max pre-compressed
Database column zstd / lz4
Memory cache lz4 / snappy
Backup zstd / xz
Streaming pipe zstd / lz4
Big data analytic snappy / zstd
Real-time game lz4

안티패턴

  • 이미 압축된 file 다시: CPU 낭비. 검사 후.
  • Compress small data (< 1KB): header overhead.
  • Decompression bomb 무 limit: OOM 공격.
  • Static asset 매 요청 압축: pre-compress.
  • Brotli only — gzip fallback X: 옛 client 깨짐.
  • Level 22 real-time: latency 큼.
  • 모든 Content-Type 압축: image 등 안 줄어듦.

🤖 LLM 활용 힌트

  • Web: brotli + gzip fallback (자동 lib).
  • Storage: zstd (modern).
  • Speed-critical: lz4 / snappy.
  • Pre-compress static.

🔗 관련 문서