Files
2nd/10_Wiki/Topics/AI_and_ML/CodeScene.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.8 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-codescene CodeScene (Behavioral Code Analysis) 10_Wiki/Topics verified self
CodeScene
behavioral code analysis
hotspot detection
code health
technical debt analytics
none B 0.85 applied
code-quality
codescene
behavioral-analysis
hotspot
technical-debt
refactoring
git-history
2026-05-10 pending
language framework
SaaS / on-prem CodeScene

CodeScene

매 한 줄

"매 git history 의 mining 으로 매 hotspot 의 detect". 매 정적 분석 X — 매 churn × complexity 의 교차점. 매 refactor priority 의 data-driven. 매 modern 의 alternative: SonarQube + 매 custom git script, Code Climate, Sourcegraph.

매 핵심

매 차별점 (vs 매 SAST)

  • 매 정적 X — 매 behavioral.
  • 매 git history 의 6 month+ 필수.
  • 매 churn (변경 빈도) × 매 complexity 의 hotspot.
  • 매 author pattern.

매 핵심 metric

Code Health (1-10)

  • 매 defect risk + delivery speed + predictability 의 결합.
  • 매 6 미만 의 alert.

Hotspot

  • 매 churn × complexity.
  • 매 top 5% 의 매 maintenance burden 의 80%.

Knowledge Map

  • 매 author 의 dominate.
  • 매 bus factor.

Coupling

  • 매 file 의 매 같이 변경 의 frequency.
  • 매 logical coupling.

매 limitation

  • 매 ≥6 month git history 필요.
  • 매 stale code (no churn) 의 weak signal.
  • 매 enterprise pricing.
  • 매 learning curve.

매 alternative

Open-source / DIY

  • 매 git log + complexity tool.
  • 매 lizard, scc.

SaaS competitor

  • SonarQube + history.
  • Code Climate Velocity.
  • Sourcegraph.
  • Pluralsight Flow.
  • LinearB / Axify (DORA-focused).

매 응용

  1. Refactor priority: 매 hotspot 의 first.
  2. Onboarding: 매 high-churn area 의 explain.
  3. Architecture review: 매 coupling 의 inspect.
  4. Risk forecast: 매 incident-prone area.
  5. Bus factor: 매 lone owner area.

💻 패턴

Hotspot detection (DIY)

import subprocess
from collections import Counter
import lizard  # 매 complexity

def hotspots(repo, since='6 months ago', top_n=20):
    # 매 1. churn (commit count per file)
    log = subprocess.check_output(
        f'git -C {repo} log --since="{since}" --name-only --pretty=format:',
        shell=True, text=True,
    )
    churn = Counter(f for f in log.strip().split('\n') if f and f.endswith(('.py', '.ts', '.js')))
    
    # 매 2. complexity (cyclomatic)
    complexity = {}
    for path, count in churn.items():
        try:
            ll = lizard.analyze_file(f'{repo}/{path}')
            complexity[path] = sum(f.cyclomatic_complexity for f in ll.function_list)
        except: pass
    
    # 매 3. hotspot = churn × complexity
    scored = [(f, churn[f] * complexity.get(f, 0)) for f in churn]
    return sorted(scored, key=lambda x: -x[1])[:top_n]

Knowledge map (bus factor)

def knowledge_map(repo, since='1 year ago'):
    log = subprocess.check_output(
        f'git -C {repo} log --since="{since}" --pretty=format:%an --name-only',
        shell=True, text=True,
    )
    
    file_authors = {}  # 매 file → Counter(author → lines)
    current_author = None
    for line in log.split('\n'):
        if line and not line.endswith(('.py', '.ts', '.js')):
            current_author = line
        elif line:
            file_authors.setdefault(line, Counter())[current_author] += 1
    
    # 매 bus factor: 매 1 author 의 80%+ 의 file
    risk_files = []
    for f, authors in file_authors.items():
        total = sum(authors.values())
        top = authors.most_common(1)[0]
        if top[1] / total > 0.8:
            risk_files.append((f, top[0], top[1] / total))
    
    return risk_files

Logical coupling (changed-together)

def logical_coupling(repo, since='6 months ago'):
    log = subprocess.check_output(
        f'git -C {repo} log --since="{since}" --pretty=format:COMMIT --name-only',
        shell=True, text=True,
    )
    
    commits = log.split('COMMIT')
    coupling = Counter()
    for c in commits:
        files = [l.strip() for l in c.split('\n') if l.strip().endswith(('.py', '.ts'))]
        for i, a in enumerate(files):
            for b in files[i+1:]:
                coupling[tuple(sorted([a, b]))] += 1
    
    return sorted(coupling.items(), key=lambda x: -x[1])[:20]

Code health score (proxy)

def health_score(file_path):
    """매 simple proxy of CodeScene-style."""
    score = 10
    
    ll = lizard.analyze_file(file_path)
    avg_cc = np.mean([f.cyclomatic_complexity for f in ll.function_list])
    if avg_cc > 10: score -= 2
    if avg_cc > 20: score -= 2
    
    line_count = ll.nloc
    if line_count > 500: score -= 1
    if line_count > 1000: score -= 2
    
    longest_func = max((f.length for f in ll.function_list), default=0)
    if longest_func > 50: score -= 1
    if longest_func > 100: score -= 2
    
    return max(1, score)

CI integration (CodeScene API)

# .github/workflows/codescene.yml
- name: CodeScene Delta Analysis
  uses: empear-analytics/codescene-pr-check@v1
  with:
    api-url: ${{ secrets.CODESCENE_API_URL }}
    api-user: ${{ secrets.CODESCENE_USER }}
    api-token: ${{ secrets.CODESCENE_TOKEN }}
    project-id: 'my-project'
    fail-on-decline: true  # 매 health 의 drop 시 의 fail

Quality gate

def quality_gate(pr_files):
    """매 PR 의 health 의 6 미만 시 의 fail."""
    failures = []
    for f in pr_files:
        score = health_score(f)
        if score < 6:
            failures.append((f, score))
    
    if failures:
        return f'Quality gate failed: {failures}'
    return 'OK'

Refactor priority dashboard

def refactor_dashboard(repo):
    return {
        'hotspots': hotspots(repo)[:10],
        'bus_factor_risks': knowledge_map(repo),
        'high_coupling': logical_coupling(repo)[:5],
        'health_distribution': {
            'critical': count_files_below(repo, 4),
            'concerning': count_files_in_range(repo, 4, 6),
            'ok': count_files_above(repo, 6),
        },
    }

🤔 결정 기준

상황 Tool
Enterprise + budget CodeScene SaaS
OSS / DIY git script + lizard + custom dashboard
DORA + general LinearB / Axify
Code search + ownership Sourcegraph
Quality gate (PR) SonarQube + custom
Bus factor only git blame + script

기본값: 매 small / OSS = 매 DIY. 매 enterprise = CodeScene 또는 LinearB.

🔗 Graph

🤖 LLM 활용

언제: 매 refactor priority. 매 onboarding. 매 architecture review. 매 incident prevention. 언제 X: 매 < 6 month repo. 매 stale codebase analysis.

안티패턴

  • 모든 hotspot 의 fix: 매 priority 의 use.
  • Static analysis 의 ignore: 매 churn-less issue 의 miss.
  • Git history 의 trust 100%: 매 squash / rebase 의 noise.
  • Bus factor 의 ignore: 매 critical risk.
  • No quality gate: 매 silent decline.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — behavioral analysis + hotspot + 매 DIY git script + CI integration