7.6 KiB
7.6 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| testing-load-k6-locust | Load Testing — k6 / Locust / Artillery | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Load Testing
Production 의 load 견디는지 검증. k6 (modern), Locust (Python), Artillery, JMeter, Gatling. Smoke / load / stress / soak / spike.
📖 핵심 개념
- Load test ≠ stress test ≠ soak test.
- VU (virtual user) = 동시 사용자.
- RPS = req per sec.
- p95/p99 latency 가 mean 보다 중요.
💻 코드 패턴
k6 (modern, Go)
// load.js
import http from 'k6/http';
import { sleep, check } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 100 }, // ramp up
{ duration: '5m', target: 100 }, // steady
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const r = http.get('https://api.example.com/users');
check(r, { 'status 200': (r) => r.status === 200 });
sleep(1);
}
k6 run load.js
Output
✓ status 200
http_req_duration..: p95=412ms p99=890ms
http_req_failed....: 0.05%
http_reqs..........: 30000 100/s
vus................: 100
Smoke test (1-5 VU, 1 min)
options = { vus: 1, duration: '1m' };
→ Sanity check.
Load test (typical)
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
{ duration: '2m', target: 0 },
]
Stress test (capacity 한계)
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 200 },
{ duration: '5m', target: 500 },
{ duration: '5m', target: 1000 },
{ duration: '5m', target: 2000 },
]
→ 어디서 깨지는지 발견.
Soak test (장시간)
options = {
vus: 100,
duration: '4h',
};
→ Memory leak / resource exhaustion 발견.
Spike test
stages: [
{ duration: '10s', target: 100 },
{ duration: '1m', target: 100 },
{ duration: '10s', target: 5000 }, // spike
{ duration: '3m', target: 5000 },
{ duration: '10s', target: 100 },
{ duration: '3m', target: 100 },
]
→ 갑작스런 traffic 처리?
Auth 가진 scenario
import http from 'k6/http';
import { check } from 'k6';
export function setup() {
const login = http.post('https://api.example.com/auth', JSON.stringify({
email: 'test@example.com',
password: 'pw',
}));
return { token: login.json('token') };
}
export default function (data) {
const r = http.get('https://api.example.com/users', {
headers: { Authorization: `Bearer ${data.token}` },
});
check(r, { '200': (r) => r.status === 200 });
}
다중 endpoint
import { group } from 'k6';
export default function () {
group('list users', () => {
http.get('/users');
});
group('create order', () => {
http.post('/orders', JSON.stringify({ ... }));
});
}
Test data
import { SharedArray } from 'k6/data';
const users = new SharedArray('users', () => JSON.parse(open('./users.json')));
export default function () {
const user = users[Math.floor(Math.random() * users.length)];
http.post('/login', JSON.stringify(user));
}
→ 100k user 가 1번씩 로그인 = 진짜 traffic.
Distributed (k6 Cloud / OSS)
k6 cloud load.js
# 또는 self-host
k6 run --distributed
→ 1 machine = ~10k VU. 더 = distributed.
Locust (Python)
# locustfile.py
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 5)
@task(3)
def list_users(self):
self.client.get('/users')
@task(1)
def view_profile(self):
self.client.get('/profile')
def on_start(self):
self.client.post('/login', json={...})
locust -f locustfile.py --host=https://api.example.com
# Web UI: http://localhost:8089
→ Python script + UI.
Artillery (YAML)
# load.yml
config:
target: https://api.example.com
phases:
- duration: 60
arrivalRate: 10
- duration: 300
arrivalRate: 100
scenarios:
- flow:
- get: { url: '/users' }
- think: 1
- post: { url: '/orders', json: { ... } }
artillery run load.yml
Gatling (Scala)
class LoadSim extends Simulation {
val httpProtocol = http.baseUrl("https://api.example.com")
val scn = scenario("Users").exec(
http("list users").get("/users")
)
setUp(scn.inject(rampUsers(100) during (30 seconds)).protocols(httpProtocol))
}
→ JVM. 큰 enterprise 가 사용.
JMeter (legacy GUI)
- XML config
- Thread group
- Plugin ecosystem
- 큰 enterprise
→ 옛 — Gatling / k6 가 모던.
CI 통합
# .github/workflows/load.yml
- run: k6 run load.js
→ PR 가 load test (smoke). 매일 staging full load.
Threshold (auto fail)
thresholds: {
http_req_duration: ['p(95)<500'], // 95% < 500ms
http_req_failed: ['rate<0.01'], // 1% 이하 fail
}
→ k6 가 exit code != 0.
Metrics export
// k6 → InfluxDB / Prometheus / DataDog
options = {
ext: {
loadimpact: { projectID: 123 },
},
};
k6 run --out influxdb=http://...
Backend monitoring 동시
Load test 중:
- App metric (CPU, memory, p99)
- DB metric (query count, lock)
- Network (latency, dropped)
- Queue (depth)
→ Bottleneck 식별.
Production-like environment
Test env = prod env / 10 (size).
Production:
- 100 instance × 4 CPU = 400 CPU
- 1000 RPS
Test:
- 10 instance × 4 CPU = 40 CPU
- 100 RPS
→ Scale linear. 결과 외삽.
Realistic load
Pareto: 80% read / 20% write.
실제 prod log → top endpoint % 추출.
읽기 많음 ≠ 쓰기 적음 (cache 매번 hit).
Test data lifecycle
- Setup: 100k user 생성 (한 번)
- Test: 매번 random 추출
- Teardown: 안 — load test data 영구 (다음 가)
또는:
- 매 test = 새 DB schema (격리)
Bottleneck 식별
RPS 늘림 → p99 latency ↑ → 어디?
1. CPU bound: app instance 가 100% CPU
2. DB bound: query 가 long, conn 다
3. Network: bandwidth 한계
4. Memory: GC 폭발
5. External: Stripe / S3 가 throttle
→ Profiler / APM (Datadog, NewRelic) 동시.
Service mesh / sidecar overhead
Istio + Envoy = 매 hop 가 1-5ms.
- Service A → mesh → Service B
- 매 RPC 가 5-20ms 더.
→ Load test 결과 가 prod 에 가까운지 검증.
Cost
1 hour 1000 RPS = ~3.6M req.
Cloud egress + storage = $.
→ Load test 도 budget.
🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| 모던 default | k6 |
| Python team | Locust |
| YAML 친화 | Artillery |
| JVM enterprise | Gatling |
| GUI 필요 | JMeter |
| Smoke test | k6 (1 VU) |
| Capacity | Stress (k6) |
| Memory leak | Soak (4h+) |
| Black Friday 대비 | Spike |
❌ 안티패턴
- No load test: prod 가 처음 진실.
- Local machine 만 = test: 1 machine 한계.
- No threshold: pass / fail 모름.
- Spike test 안 함: traffic burst 깨짐.
- Soak 안 함: 1주 후 OOM.
- DB / cache reset 없음: cache hit rate 가짜.
- Realistic 아닌 mix: 모두 read = 가짜.
🤖 LLM 활용 힌트
- k6 가 modern default.
- Smoke / load / stress / soak / spike = 5 종류.
- p95/p99 가 핵심 metric.
- Bottleneck 식별 = profiler 동시.