d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
178 lines
5.1 KiB
Markdown
178 lines
5.1 KiB
Markdown
---
|
|
id: wiki-2026-0508-scalability
|
|
title: Scalability
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Scalability, 확장성, scale-out, scale-up]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [architecture, distributed-systems, performance]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: yaml
|
|
framework: kubernetes
|
|
---
|
|
|
|
# Scalability
|
|
|
|
## 매 한 줄
|
|
> **"매 부하가 늘 때 매 graceful하게 capacity를 키울 수 있는 능력"**. Scalability는 매 단일 dimension(traffic, data, compute)이 아니라 매 multi-axis property. 2026년에는 매 K8s HPA + KEDA, 매 serverless auto-scale, 매 LLM token-throughput scaling이 매 일상.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 두 축
|
|
- **Vertical (scale-up)**: 매 큰 머신 — 매 limit 빨리.
|
|
- **Horizontal (scale-out)**: 매 더 많은 머신 — 매 stateless 필요.
|
|
|
|
### 매 차원
|
|
- **Load**: req/sec.
|
|
- **Data**: GB → PB.
|
|
- **Geographic**: 매 region.
|
|
- **User**: 매 동시 user.
|
|
- **Functional**: 매 feature 추가가 매 system을 깨지 않음.
|
|
|
|
### 매 응용
|
|
1. 매 web tier — auto-scale group.
|
|
2. 매 DB — sharding / read replica.
|
|
3. 매 LLM serving — vLLM tensor parallel + KV cache 분산.
|
|
4. 매 event pipeline — Kafka partition scale.
|
|
|
|
## 💻 패턴
|
|
|
|
### 매 K8s HPA (CPU 기반)
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata: { name: api-hpa }
|
|
spec:
|
|
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: api }
|
|
minReplicas: 3
|
|
maxReplicas: 50
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target: { type: Utilization, averageUtilization: 70 }
|
|
```
|
|
|
|
### 매 KEDA (event-driven scale)
|
|
```yaml
|
|
apiVersion: keda.sh/v1alpha1
|
|
kind: ScaledObject
|
|
metadata: { name: kafka-consumer }
|
|
spec:
|
|
scaleTargetRef: { name: consumer }
|
|
minReplicaCount: 0
|
|
maxReplicaCount: 100
|
|
triggers:
|
|
- type: kafka
|
|
metadata:
|
|
bootstrapServers: kafka:9092
|
|
consumerGroup: orders
|
|
topic: order-events
|
|
lagThreshold: "100"
|
|
```
|
|
|
|
### 매 Stateless service (scale-out 가능)
|
|
```typescript
|
|
// 매 session 매 외부화 (Redis)
|
|
import express from "express";
|
|
import session from "express-session";
|
|
import RedisStore from "connect-redis";
|
|
import { createClient } from "redis";
|
|
|
|
const redis = createClient({ url: "redis://redis:6379" });
|
|
await redis.connect();
|
|
|
|
const app = express();
|
|
app.use(session({
|
|
store: new RedisStore({ client: redis }),
|
|
secret: process.env.SESSION_SECRET!,
|
|
resave: false, saveUninitialized: false,
|
|
}));
|
|
// 매 어느 instance든 매 동일 session.
|
|
```
|
|
|
|
### 매 DB sharding (hash-based)
|
|
```typescript
|
|
function shardFor(userId: string): string {
|
|
const hash = crc32(userId);
|
|
return `db-shard-${hash % 8}`;
|
|
}
|
|
async function getUser(id: string) {
|
|
const shard = shardFor(id);
|
|
return pool[shard].query("SELECT * FROM users WHERE id=$1", [id]);
|
|
}
|
|
```
|
|
|
|
### 매 Read replica
|
|
```typescript
|
|
const writeDb = postgres({ host: "primary" });
|
|
const readDb = postgres({ host: "replica.read" });
|
|
|
|
async function placeOrder(o: Order) { return writeDb`INSERT INTO orders ...`; }
|
|
async function listOrders(uid: string) { return readDb`SELECT * FROM orders WHERE uid=${uid}`; }
|
|
```
|
|
|
|
### 매 LLM tensor-parallel (vLLM 0.7+)
|
|
```bash
|
|
vllm serve meta-llama/Llama-3.3-70B-Instruct \
|
|
--tensor-parallel-size 4 \
|
|
--gpu-memory-utilization 0.92 \
|
|
--max-num-seqs 256
|
|
```
|
|
|
|
### 매 Cache layer (scale read)
|
|
```typescript
|
|
async function getProduct(id: string) {
|
|
const cached = await redis.get(`p:${id}`);
|
|
if (cached) return JSON.parse(cached);
|
|
const p = await db.query("SELECT * FROM products WHERE id=$1", [id]);
|
|
await redis.setex(`p:${id}`, 60, JSON.stringify(p));
|
|
return p;
|
|
}
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| 매 traffic spike (예측 가능) | HPA + capacity planning. |
|
|
| 매 burst (predicate X) | Serverless / KEDA scale-to-zero. |
|
|
| 매 data > single node | Sharding. |
|
|
| 매 read >> write | Replica. |
|
|
| 매 global users | Multi-region + edge cache. |
|
|
| 매 LLM serving | vLLM TP + KV-cache routing. |
|
|
|
|
**기본값**: 매 stateless service + HPA + Redis cache + read replica.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Distributed Systems]]
|
|
- 응용: [[Microservices]] · [[Serverless_Architecture]] · [[Service Mesh]]
|
|
- Adjacent: [[CAP Theorem & PACELC]] · [[Sharding]] · [[Load Balancer]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 capacity 설계, 매 bottleneck 진단, 매 cost-perf trade-off.
|
|
**언제 X**: 매 단일 user 매 internal tool (매 over-engineering).
|
|
|
|
## ❌ 안티패턴
|
|
- **매 premature sharding**: 매 single PG로 매 충분한데 매 split.
|
|
- **매 stateful pod scale-out**: 매 session 매 일부 instance 만.
|
|
- **매 cache stampede 무시**: 매 expiry 동시에.
|
|
- **매 N+1 query에서 scale-out 으로 도망**: 매 query 먼저 고칠 것.
|
|
- **매 monolith 만 scale-up**: 매 vertical 한계.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Designing Data-Intensive Applications, K8s docs, vLLM docs).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — HPA/KEDA/sharding/vLLM patterns |
|