Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

5.2 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Scalability

매 한 줄

"매 부하가 늘 때 매 graceful하게 capacity를 키울 수 있는 능력". Scalability는 매 단일 dimension(traffic, data, compute)이 아니라 매 multi-axis property. 2026년에는 매 K8s HPA + KEDA, 매 serverless auto-scale, 매 LLM token-throughput scaling이 매 일상.

매 핵심

매 두 축

Vertical (scale-up): 매 큰 머신 — 매 limit 빨리.
Horizontal (scale-out): 매 더 많은 머신 — 매 stateless 필요.

매 차원

Load: req/sec.
Data: GB → PB.
Geographic: 매 region.
User: 매 동시 user.
Functional: 매 feature 추가가 매 system을 깨지 않음.

매 응용

매 web tier — auto-scale group.
매 DB — sharding / read replica.
매 LLM serving — vLLM tensor parallel + KV cache 분산.
매 event pipeline — Kafka partition scale.

💻 패턴

매 K8s HPA (CPU 기반)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: api-hpa }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: api }
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }

매 KEDA (event-driven scale)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata: { name: kafka-consumer }
spec:
  scaleTargetRef: { name: consumer }
  minReplicaCount: 0
  maxReplicaCount: 100
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: orders
        topic: order-events
        lagThreshold: "100"

매 Stateless service (scale-out 가능)

// 매 session 매 외부화 (Redis)
import express from "express";
import session from "express-session";
import RedisStore from "connect-redis";
import { createClient } from "redis";

const redis = createClient({ url: "redis://redis:6379" });
await redis.connect();

const app = express();
app.use(session({
  store: new RedisStore({ client: redis }),
  secret: process.env.SESSION_SECRET!,
  resave: false, saveUninitialized: false,
}));
// 매 어느 instance든 매 동일 session.

매 DB sharding (hash-based)

function shardFor(userId: string): string {
  const hash = crc32(userId);
  return `db-shard-${hash % 8}`;
}
async function getUser(id: string) {
  const shard = shardFor(id);
  return pool[shard].query("SELECT * FROM users WHERE id=$1", [id]);
}

매 Read replica

const writeDb = postgres({ host: "primary" });
const readDb = postgres({ host: "replica.read" });

async function placeOrder(o: Order) { return writeDb`INSERT INTO orders ...`; }
async function listOrders(uid: string) { return readDb`SELECT * FROM orders WHERE uid=${uid}`; }

매 LLM tensor-parallel (vLLM 0.7+)

vllm serve meta-llama/Llama-3.3-70B-Instruct \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.92 \
  --max-num-seqs 256

매 Cache layer (scale read)

async function getProduct(id: string) {
  const cached = await redis.get(`p:${id}`);
  if (cached) return JSON.parse(cached);
  const p = await db.query("SELECT * FROM products WHERE id=$1", [id]);
  await redis.setex(`p:${id}`, 60, JSON.stringify(p));
  return p;
}

매 결정 기준

상황	Approach
매 traffic spike (예측 가능)	HPA + capacity planning.
매 burst (predicate X)	Serverless / KEDA scale-to-zero.
매 data > single node	Sharding.
매 read >> write	Replica.
매 global users	Multi-region + edge cache.
매 LLM serving	vLLM TP + KV-cache routing.

기본값: 매 stateless service + HPA + Redis cache + read replica.

🔗 Graph

부모: Distributed Systems · Cloud Architecture
변형: Vertical Scaling · Horizontal Scaling · Elasticity
응용: Microservices · Serverless_Architecture · Service Mesh
Adjacent: CAP Theorem · Sharding · Caching · Load Balancer

🤖 LLM 활용

언제: 매 capacity 설계, 매 bottleneck 진단, 매 cost-perf trade-off. 언제 X: 매 단일 user 매 internal tool (매 over-engineering).

❌ 안티패턴

매 premature sharding: 매 single PG로 매 충분한데 매 split.
매 stateful pod scale-out: 매 session 매 일부 instance 만.
매 cache stampede 무시: 매 expiry 동시에.
매 N+1 query에서 scale-out 으로 도망: 매 query 먼저 고칠 것.
매 monolith 만 scale-up: 매 vertical 한계.

🧪 검증 / 중복

Verified (Designing Data-Intensive Applications, K8s docs, vLLM docs).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — HPA/KEDA/sharding/vLLM patterns

5.2 KiB Raw Blame History