"매 green CI ≠ correct system." 매 anti-pattern: 매 passing tests 의 confidence inflation, 매 actual coverage / assertion strength / production behavior 의 disconnect. 매 2026 prevalent — 매 LLM-generated tests, 매 mock-heavy suites, 매 flaky-retry hides 매 real bugs.
매 핵심
매 Symptoms
매 100% green builds, 매 still production incidents 의 frequent.
매 tests assert truisms (expect(1).toBe(1)).
매 mocks return canned data — 매 integration paths 의 untested.
매 retries hide flakiness — 매 race conditions 의 ignored.
매 coverage % high, 매 mutation score low.
매 Root causes
Goodhart's law: 매 green check 의 metric → metric 의 target → 매 gamed.
Mock theater: 매 unit isolation 의 over-mocked, 매 real failure modes 의 missed.
AI-generated tests: 매 LLM이 매 implementation을 매 mirror — 매 same bug 의 test에도 present.
Flaky-retry culture: 매 "retry until green" 의 normalized.
매 Detection
Mutation testing — 매 assertion strength measurement.
Property-based testing — 매 input space coverage.
Production observability — 매 errors in prod that tests 의 missed.
Test-impact analysis — 매 untouched code paths surface.
매 응용
CI quality dashboard — 매 mutation score + flake rate.
Test review checklist — 매 each test 의 specific failure mode가 매 catch?
Chaos engineering — 매 production-like failures inject.
# .github/workflows/test.yml- name:Testrun:npm test# NO retry — 매 flake 의 immediately surface- name:Flake reportif:failure()run:| echo "::warning::Test failed — investigate, do not retry blindly"
# 매 prod traces → test fixturesimportjsonfromopentelemetry.traceimportget_tracerdefreplay_prod_trace(trace_id:str):trace=fetch_trace(trace_id)# from observability backendinputs=extract_inputs(trace)result=run_system(inputs)expected=trace.outputsassertresult==expected,f"Drift from prod: {trace_id}"
Real integration (no mocks)
// testcontainers — 매 real DB
import{GenericContainer}from'testcontainers';letpg: StartedTestContainer;beforeAll(async()=>{pg=awaitnewGenericContainer('postgres:17').withExposedPorts(5432).withEnvironment({POSTGRES_PASSWORD:'test'}).start();});test('real query',async()=>{constclient=connect(pg.getMappedPort(5432));awaitclient.query('CREATE TABLE u (id int)');// 매 real SQL behavior tested
});
매 결정 기준
상황
Approach
New test suite
Property-based + integration
Existing green-but-fragile suite
Mutation testing audit
Flaky test
Investigate root cause, never blind-retry
Mock-heavy suite
Add testcontainers for real I/O
Coverage-driven culture
Switch metric to mutation score
기본값: 매 green CI 의 trust 의 X — 매 mutation score + 매 prod observability + 매 chaos drills 의 combined signal.
언제: test review (assertion strength critique), mutation report triage, prod-trace replay generation.
언제 X: 매 LLM이 매 test generation 단독 — 매 same blind spots reproduce.
❌ 안티패턴
Coverage as quality: 매 100% line coverage, 매 0% mutation kill rate.
Auto-retry on fail: 매 race condition 의 hide → prod incident.
Mock everything: 매 unit "passes", 매 integration broken.
LLM-only test suite: 매 implementation mirror — 매 bug parity.
🧪 검증 / 중복
Verified (Hillel Wayne "Test theater", Google Testing Blog "Just Say No to More End-to-End Tests").
신뢰도 A.
🕓 Changelog
날짜
변경
2026-05-08
Phase 1
2026-05-10
Manual cleanup — CI theater anti-pattern, mutation testing, real integration