Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

7.6 KiB

Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases

title

Load Testing

Production 의 load 견디는지 검증. k6 (modern), Locust (Python), Artillery, JMeter, Gatling. Smoke / load / stress / soak / spike.

📖 핵심 개념

Load test ≠ stress test ≠ soak test.
VU (virtual user) = 동시 사용자.
RPS = req per sec.
p95/p99 latency 가 mean 보다 중요.

💻 코드 패턴

k6 (modern, Go)

// load.js
import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 100 },   // ramp up
    { duration: '5m',  target: 100 },   // steady
    { duration: '30s', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const r = http.get('https://api.example.com/users');
  check(r, { 'status 200': (r) => r.status === 200 });
  sleep(1);
}

k6 run load.js

Output

✓ status 200

http_req_duration..: p95=412ms p99=890ms
http_req_failed....: 0.05%
http_reqs..........: 30000  100/s
vus................: 100

Smoke test (1-5 VU, 1 min)

options = { vus: 1, duration: '1m' };

→ Sanity check.

Load test (typical)

stages: [
  { duration: '2m', target: 100 },
  { duration: '5m', target: 100 },
  { duration: '2m', target: 0 },
]

Stress test (capacity 한계)

stages: [
  { duration: '2m', target: 100 },
  { duration: '5m', target: 200 },
  { duration: '5m', target: 500 },
  { duration: '5m', target: 1000 },
  { duration: '5m', target: 2000 },
]

→ 어디서 깨지는지 발견.

Soak test (장시간)

options = {
  vus: 100,
  duration: '4h',
};

→ Memory leak / resource exhaustion 발견.

Spike test

stages: [
  { duration: '10s', target: 100 },
  { duration: '1m', target: 100 },
  { duration: '10s', target: 5000 },  // spike
  { duration: '3m', target: 5000 },
  { duration: '10s', target: 100 },
  { duration: '3m', target: 100 },
]

→ 갑작스런 traffic 처리?

Auth 가진 scenario

import http from 'k6/http';
import { check } from 'k6';

export function setup() {
  const login = http.post('https://api.example.com/auth', JSON.stringify({
    email: 'test@example.com',
    password: 'pw',
  }));
  return { token: login.json('token') };
}

export default function (data) {
  const r = http.get('https://api.example.com/users', {
    headers: { Authorization: `Bearer ${data.token}` },
  });
  check(r, { '200': (r) => r.status === 200 });
}

다중 endpoint

import { group } from 'k6';

export default function () {
  group('list users', () => {
    http.get('/users');
  });
  
  group('create order', () => {
    http.post('/orders', JSON.stringify({ ... }));
  });
}

Test data

import { SharedArray } from 'k6/data';

const users = new SharedArray('users', () => JSON.parse(open('./users.json')));

export default function () {
  const user = users[Math.floor(Math.random() * users.length)];
  http.post('/login', JSON.stringify(user));
}

→ 100k user 가 1번씩 로그인 = 진짜 traffic.

Distributed (k6 Cloud / OSS)

k6 cloud load.js
# 또는 self-host
k6 run --distributed

→ 1 machine = ~10k VU. 더 = distributed.

Locust (Python)

# locustfile.py
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)
    
    @task(3)
    def list_users(self):
        self.client.get('/users')
    
    @task(1)
    def view_profile(self):
        self.client.get('/profile')
    
    def on_start(self):
        self.client.post('/login', json={...})

locust -f locustfile.py --host=https://api.example.com
# Web UI: http://localhost:8089

→ Python script + UI.

Artillery (YAML)

# load.yml
config:
  target: https://api.example.com
  phases:
    - duration: 60
      arrivalRate: 10
    - duration: 300
      arrivalRate: 100
scenarios:
  - flow:
      - get: { url: '/users' }
      - think: 1
      - post: { url: '/orders', json: { ... } }

artillery run load.yml

Gatling (Scala)

class LoadSim extends Simulation {
  val httpProtocol = http.baseUrl("https://api.example.com")
  
  val scn = scenario("Users").exec(
    http("list users").get("/users")
  )
  
  setUp(scn.inject(rampUsers(100) during (30 seconds)).protocols(httpProtocol))
}

→ JVM. 큰 enterprise 가 사용.

JMeter (legacy GUI)

- XML config
- Thread group
- Plugin ecosystem
- 큰 enterprise

→ 옛 — Gatling / k6 가 모던.

CI 통합

# .github/workflows/load.yml
- run: k6 run load.js

→ PR 가 load test (smoke). 매일 staging full load.

Threshold (auto fail)

thresholds: {
  http_req_duration: ['p(95)<500'],  // 95% < 500ms
  http_req_failed: ['rate<0.01'],     // 1% 이하 fail
}

→ k6 가 exit code != 0.

Metrics export

// k6 → InfluxDB / Prometheus / DataDog
options = {
  ext: {
    loadimpact: { projectID: 123 },
  },
};

k6 run --out influxdb=http://...

Backend monitoring 동시

Load test 중:
- App metric (CPU, memory, p99)
- DB metric (query count, lock)
- Network (latency, dropped)
- Queue (depth)

→ Bottleneck 식별.

Production-like environment

Test env = prod env / 10 (size).

Production:
- 100 instance × 4 CPU = 400 CPU
- 1000 RPS

Test:
- 10 instance × 4 CPU = 40 CPU
- 100 RPS

→ Scale linear. 결과 외삽.

Realistic load

Pareto: 80% read / 20% write.
실제 prod log → top endpoint % 추출.

읽기 많음 ≠ 쓰기 적음 (cache 매번 hit).

Test data lifecycle

- Setup: 100k user 생성 (한 번)
- Test: 매번 random 추출
- Teardown: 안 — load test data 영구 (다음 가)

또는:
- 매 test = 새 DB schema (격리)

Bottleneck 식별

RPS 늘림 → p99 latency ↑ → 어디?

1. CPU bound: app instance 가 100% CPU
2. DB bound: query 가 long, conn 다
3. Network: bandwidth 한계
4. Memory: GC 폭발
5. External: Stripe / S3 가 throttle

→ Profiler / APM (Datadog, NewRelic) 동시.

Service mesh / sidecar overhead

Istio + Envoy = 매 hop 가 1-5ms.
- Service A → mesh → Service B
- 매 RPC 가 5-20ms 더.

→ Load test 결과 가 prod 에 가까운지 검증.

Cost

1 hour 1000 RPS = ~3.6M req.
Cloud egress + storage = $.

→ Load test 도 budget.

🤔 의사결정 기준

상황	추천
모던 default	k6
Python team	Locust
YAML 친화	Artillery
JVM enterprise	Gatling
GUI 필요	JMeter
Smoke test	k6 (1 VU)
Capacity	Stress (k6)
Memory leak	Soak (4h+)
Black Friday 대비	Spike

❌ 안티패턴

No load test: prod 가 처음 진실.
Local machine 만 = test: 1 machine 한계.
No threshold: pass / fail 모름.
Spike test 안 함: traffic burst 깨짐.
Soak 안 함: 1주 후 OOM.
DB / cache reset 없음: cache hit rate 가짜.
Realistic 아닌 mix: 모두 read = 가짜.

🤖 LLM 활용 힌트

k6 가 modern default.
Smoke / load / stress / soak / spike = 5 종류.
p95/p99 가 핵심 metric.
Bottleneck 식별 = profiler 동시.

7.6 KiB Raw Blame History Unescape Escape