Files
2nd/10_Wiki/Topics/Green-Check-Mark-Syndrome.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

5.5 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-green-check-mark-syndrome Green Check Mark Syndrome 10_Wiki/Topics verified self
Green Check Syndrome
CI Theater
Test Theater
none A 0.9 applied
anti-pattern
ci-cd
testing
observability
2026-05-10 pending
language framework
TypeScript/Python GitHub-Actions/pytest

Green Check Mark Syndrome

매 한 줄

"매 green CI ≠ correct system." 매 anti-pattern: 매 passing tests 의 confidence inflation, 매 actual coverage / assertion strength / production behavior 의 disconnect. 매 2026 prevalent — 매 LLM-generated tests, 매 mock-heavy suites, 매 flaky-retry hides 매 real bugs.

매 핵심

매 Symptoms

  • 매 100% green builds, 매 still production incidents 의 frequent.
  • 매 tests assert truisms (expect(1).toBe(1)).
  • 매 mocks return canned data — 매 integration paths 의 untested.
  • 매 retries hide flakiness — 매 race conditions 의 ignored.
  • 매 coverage % high, 매 mutation score low.

매 Root causes

  • Goodhart's law: 매 green check 의 metric → metric 의 target → 매 gamed.
  • Mock theater: 매 unit isolation 의 over-mocked, 매 real failure modes 의 missed.
  • AI-generated tests: 매 LLM이 매 implementation을 매 mirror — 매 same bug 의 test에도 present.
  • Flaky-retry culture: 매 "retry until green" 의 normalized.

매 Detection

  • Mutation testing — 매 assertion strength measurement.
  • Property-based testing — 매 input space coverage.
  • Production observability — 매 errors in prod that tests 의 missed.
  • Test-impact analysis — 매 untouched code paths surface.

매 응용

  1. CI quality dashboard — 매 mutation score + flake rate.
  2. Test review checklist — 매 each test 의 specific failure mode가 매 catch?
  3. Chaos engineering — 매 production-like failures inject.

💻 패턴

Mutation testing (Stryker)

// stryker.conf.json
{
  "mutate": ["src/**/*.ts"],
  "testRunner": "vitest",
  "thresholds": { "high": 80, "low": 60, "break": 50 }
}
npx stryker run
# 매 surviving mutants 의 매 weak tests indicate

Property-based test

import { fc } from 'fast-check';

test('reverse twice = identity', () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (arr) => {
      expect(reverse(reverse(arr))).toEqual(arr);
    })
  );
});

Flake-detector (no silent retry)

# .github/workflows/test.yml
- name: Test
  run: npm test
  # NO retry — 매 flake 의 immediately surface
- name: Flake report
  if: failure()
  run: |
    echo "::warning::Test failed — investigate, do not retry blindly"

Assertion-strength linter

# detect weak assertions
import ast
from pathlib import Path

WEAK_PATTERNS = {'assertTrue(True)', 'assertEqual(1, 1)', 'assert True'}

for file in Path('tests').rglob('*.py'):
    tree = ast.parse(file.read_text())
    for node in ast.walk(tree):
        if isinstance(node, ast.Call) and ast.unparse(node) in WEAK_PATTERNS:
            print(f"WEAK: {file}:{node.lineno}")

Production-trace replay

# 매 prod traces → test fixtures
import json
from opentelemetry.trace import get_tracer

def replay_prod_trace(trace_id: str):
    trace = fetch_trace(trace_id)  # from observability backend
    inputs = extract_inputs(trace)
    result = run_system(inputs)
    expected = trace.outputs
    assert result == expected, f"Drift from prod: {trace_id}"

Real integration (no mocks)

// testcontainers — 매 real DB
import { GenericContainer } from 'testcontainers';

let pg: StartedTestContainer;
beforeAll(async () => {
  pg = await new GenericContainer('postgres:17')
    .withExposedPorts(5432)
    .withEnvironment({ POSTGRES_PASSWORD: 'test' })
    .start();
});

test('real query', async () => {
  const client = connect(pg.getMappedPort(5432));
  await client.query('CREATE TABLE u (id int)');
  // 매 real SQL behavior tested
});

매 결정 기준

상황 Approach
New test suite Property-based + integration
Existing green-but-fragile suite Mutation testing audit
Flaky test Investigate root cause, never blind-retry
Mock-heavy suite Add testcontainers for real I/O
Coverage-driven culture Switch metric to mutation score

기본값: 매 green CI 의 trust 의 X — 매 mutation score + 매 prod observability + 매 chaos drills 의 combined signal.

🔗 Graph

🤖 LLM 활용

언제: test review (assertion strength critique), mutation report triage, prod-trace replay generation. 언제 X: 매 LLM이 매 test generation 단독 — 매 same blind spots reproduce.

안티패턴

  • Coverage as quality: 매 100% line coverage, 매 0% mutation kill rate.
  • Auto-retry on fail: 매 race condition 의 hide → prod incident.
  • Mock everything: 매 unit "passes", 매 integration broken.
  • LLM-only test suite: 매 implementation mirror — 매 bug parity.

🧪 검증 / 중복

  • Verified (Hillel Wayne "Test theater", Google Testing Blog "Just Say No to More End-to-End Tests").
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — CI theater anti-pattern, mutation testing, real integration