--- id: wiki-2026-0508-green-check-mark-syndrome title: Green Check Mark Syndrome category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Green Check Syndrome, CI Theater, Test Theater] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [anti-pattern, ci-cd, testing, observability] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: TypeScript/Python framework: GitHub-Actions/pytest --- # Green Check Mark Syndrome ## 매 한 줄 > **"매 green CI ≠ correct system."** 매 anti-pattern: 매 passing tests 의 confidence inflation, 매 actual coverage / assertion strength / production behavior 의 disconnect. 매 2026 prevalent — 매 LLM-generated tests, 매 mock-heavy suites, 매 flaky-retry hides 매 real bugs. ## 매 핵심 ### 매 Symptoms - 매 100% green builds, 매 still production incidents 의 frequent. - 매 tests assert truisms (`expect(1).toBe(1)`). - 매 mocks return canned data — 매 integration paths 의 untested. - 매 retries hide flakiness — 매 race conditions 의 ignored. - 매 coverage % high, 매 mutation score low. ### 매 Root causes - **Goodhart's law**: 매 green check 의 metric → metric 의 target → 매 gamed. - **Mock theater**: 매 unit isolation 의 over-mocked, 매 real failure modes 의 missed. - **AI-generated tests**: 매 LLM이 매 implementation을 매 mirror — 매 same bug 의 test에도 present. - **Flaky-retry culture**: 매 "retry until green" 의 normalized. ### 매 Detection - Mutation testing — 매 assertion strength measurement. - Property-based testing — 매 input space coverage. - Production observability — 매 errors in prod that tests 의 missed. - Test-impact analysis — 매 untouched code paths surface. ### 매 응용 1. CI quality dashboard — 매 mutation score + flake rate. 2. Test review checklist — 매 each test 의 specific failure mode가 매 catch? 3. Chaos engineering — 매 production-like failures inject. ## 💻 패턴 ### Mutation testing (Stryker) ```javascript // stryker.conf.json { "mutate": ["src/**/*.ts"], "testRunner": "vitest", "thresholds": { "high": 80, "low": 60, "break": 50 } } ``` ```bash npx stryker run # 매 surviving mutants 의 매 weak tests indicate ``` ### Property-based test ```typescript import { fc } from 'fast-check'; test('reverse twice = identity', () => { fc.assert( fc.property(fc.array(fc.integer()), (arr) => { expect(reverse(reverse(arr))).toEqual(arr); }) ); }); ``` ### Flake-detector (no silent retry) ```yaml # .github/workflows/test.yml - name: Test run: npm test # NO retry — 매 flake 의 immediately surface - name: Flake report if: failure() run: | echo "::warning::Test failed — investigate, do not retry blindly" ``` ### Assertion-strength linter ```python # detect weak assertions import ast from pathlib import Path WEAK_PATTERNS = {'assertTrue(True)', 'assertEqual(1, 1)', 'assert True'} for file in Path('tests').rglob('*.py'): tree = ast.parse(file.read_text()) for node in ast.walk(tree): if isinstance(node, ast.Call) and ast.unparse(node) in WEAK_PATTERNS: print(f"WEAK: {file}:{node.lineno}") ``` ### Production-trace replay ```python # 매 prod traces → test fixtures import json from opentelemetry.trace import get_tracer def replay_prod_trace(trace_id: str): trace = fetch_trace(trace_id) # from observability backend inputs = extract_inputs(trace) result = run_system(inputs) expected = trace.outputs assert result == expected, f"Drift from prod: {trace_id}" ``` ### Real integration (no mocks) ```typescript // testcontainers — 매 real DB import { GenericContainer } from 'testcontainers'; let pg: StartedTestContainer; beforeAll(async () => { pg = await new GenericContainer('postgres:17') .withExposedPorts(5432) .withEnvironment({ POSTGRES_PASSWORD: 'test' }) .start(); }); test('real query', async () => { const client = connect(pg.getMappedPort(5432)); await client.query('CREATE TABLE u (id int)'); // 매 real SQL behavior tested }); ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | New test suite | Property-based + integration | | Existing green-but-fragile suite | Mutation testing audit | | Flaky test | Investigate root cause, never blind-retry | | Mock-heavy suite | Add testcontainers for real I/O | | Coverage-driven culture | Switch metric to mutation score | **기본값**: 매 green CI 의 trust 의 X — 매 mutation score + 매 prod observability + 매 chaos drills 의 combined signal. ## 🔗 Graph - 부모: [[CI CD]] - 변형: [[Test-Theater]] - 응용: [[Mutation-Testing]] · [[Property-Based-Testing]] · [[Chaos-Engineering]] - Adjacent: [[Goodharts-Law]] · [[Observability]] ## 🤖 LLM 활용 **언제**: test review (assertion strength critique), mutation report triage, prod-trace replay generation. **언제 X**: 매 LLM이 매 test generation 단독 — 매 same blind spots reproduce. ## ❌ 안티패턴 - **Coverage as quality**: 매 100% line coverage, 매 0% mutation kill rate. - **Auto-retry on fail**: 매 race condition 의 hide → prod incident. - **Mock everything**: 매 unit "passes", 매 integration broken. - **LLM-only test suite**: 매 implementation mirror — 매 bug parity. ## 🧪 검증 / 중복 - Verified (Hillel Wayne "Test theater", Google Testing Blog "Just Say No to More End-to-End Tests"). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — CI theater anti-pattern, mutation testing, real integration |