--- id: wiki-2026-0508-scalability title: Scalability category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Scalability, 확장성, scale-out, scale-up] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [architecture, distributed-systems, performance] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: yaml framework: kubernetes --- # Scalability ## 매 한 줄 > **"매 부하가 늘 때 매 graceful하게 capacity를 키울 수 있는 능력"**. Scalability는 매 단일 dimension(traffic, data, compute)이 아니라 매 multi-axis property. 2026년에는 매 K8s HPA + KEDA, 매 serverless auto-scale, 매 LLM token-throughput scaling이 매 일상. ## 매 핵심 ### 매 두 축 - **Vertical (scale-up)**: 매 큰 머신 — 매 limit 빨리. - **Horizontal (scale-out)**: 매 더 많은 머신 — 매 stateless 필요. ### 매 차원 - **Load**: req/sec. - **Data**: GB → PB. - **Geographic**: 매 region. - **User**: 매 동시 user. - **Functional**: 매 feature 추가가 매 system을 깨지 않음. ### 매 응용 1. 매 web tier — auto-scale group. 2. 매 DB — sharding / read replica. 3. 매 LLM serving — vLLM tensor parallel + KV cache 분산. 4. 매 event pipeline — Kafka partition scale. ## 💻 패턴 ### 매 K8s HPA (CPU 기반) ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: { name: api-hpa } spec: scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: api } minReplicas: 3 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: { type: Utilization, averageUtilization: 70 } ``` ### 매 KEDA (event-driven scale) ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: { name: kafka-consumer } spec: scaleTargetRef: { name: consumer } minReplicaCount: 0 maxReplicaCount: 100 triggers: - type: kafka metadata: bootstrapServers: kafka:9092 consumerGroup: orders topic: order-events lagThreshold: "100" ``` ### 매 Stateless service (scale-out 가능) ```typescript // 매 session 매 외부화 (Redis) import express from "express"; import session from "express-session"; import RedisStore from "connect-redis"; import { createClient } from "redis"; const redis = createClient({ url: "redis://redis:6379" }); await redis.connect(); const app = express(); app.use(session({ store: new RedisStore({ client: redis }), secret: process.env.SESSION_SECRET!, resave: false, saveUninitialized: false, })); // 매 어느 instance든 매 동일 session. ``` ### 매 DB sharding (hash-based) ```typescript function shardFor(userId: string): string { const hash = crc32(userId); return `db-shard-${hash % 8}`; } async function getUser(id: string) { const shard = shardFor(id); return pool[shard].query("SELECT * FROM users WHERE id=$1", [id]); } ``` ### 매 Read replica ```typescript const writeDb = postgres({ host: "primary" }); const readDb = postgres({ host: "replica.read" }); async function placeOrder(o: Order) { return writeDb`INSERT INTO orders ...`; } async function listOrders(uid: string) { return readDb`SELECT * FROM orders WHERE uid=${uid}`; } ``` ### 매 LLM tensor-parallel (vLLM 0.7+) ```bash vllm serve meta-llama/Llama-3.3-70B-Instruct \ --tensor-parallel-size 4 \ --gpu-memory-utilization 0.92 \ --max-num-seqs 256 ``` ### 매 Cache layer (scale read) ```typescript async function getProduct(id: string) { const cached = await redis.get(`p:${id}`); if (cached) return JSON.parse(cached); const p = await db.query("SELECT * FROM products WHERE id=$1", [id]); await redis.setex(`p:${id}`, 60, JSON.stringify(p)); return p; } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 traffic spike (예측 가능) | HPA + capacity planning. | | 매 burst (predicate X) | Serverless / KEDA scale-to-zero. | | 매 data > single node | Sharding. | | 매 read >> write | Replica. | | 매 global users | Multi-region + edge cache. | | 매 LLM serving | vLLM TP + KV-cache routing. | **기본값**: 매 stateless service + HPA + Redis cache + read replica. ## 🔗 Graph - 부모: [[Distributed Systems]] - 응용: [[Microservices]] · [[Serverless_Architecture]] · [[Service Mesh]] - Adjacent: [[CAP Theorem]] · [[Sharding]] · [[Load Balancer]] ## 🤖 LLM 활용 **언제**: 매 capacity 설계, 매 bottleneck 진단, 매 cost-perf trade-off. **언제 X**: 매 단일 user 매 internal tool (매 over-engineering). ## ❌ 안티패턴 - **매 premature sharding**: 매 single PG로 매 충분한데 매 split. - **매 stateful pod scale-out**: 매 session 매 일부 instance 만. - **매 cache stampede 무시**: 매 expiry 동시에. - **매 N+1 query에서 scale-out 으로 도망**: 매 query 먼저 고칠 것. - **매 monolith 만 scale-up**: 매 vertical 한계. ## 🧪 검증 / 중복 - Verified (Designing Data-Intensive Applications, K8s docs, vLLM docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — HPA/KEDA/sharding/vLLM patterns |