--- id: wiki-2026-0508-codescene title: CodeScene (Behavioral Code Analysis) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [CodeScene, behavioral code analysis, hotspot detection, code health, technical debt analytics] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: applied tags: [code-quality, codescene, behavioral-analysis, hotspot, technical-debt, refactoring, git-history] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: SaaS / on-prem framework: CodeScene --- # CodeScene ## 매 한 줄 > **"매 git history 의 mining 으로 매 hotspot 의 detect"**. 매 정적 분석 X — 매 churn × complexity 의 교차점. 매 refactor priority 의 data-driven. 매 modern 의 alternative: SonarQube + 매 custom git script, Code Climate, Sourcegraph. ## 매 핵심 ### 매 차별점 (vs 매 SAST) - 매 정적 X — 매 behavioral. - 매 git history 의 6 month+ 필수. - 매 churn (변경 빈도) × 매 complexity 의 hotspot. - 매 author pattern. ### 매 핵심 metric #### Code Health (1-10) - 매 defect risk + delivery speed + predictability 의 결합. - 매 6 미만 의 alert. #### Hotspot - 매 churn × complexity. - 매 top 5% 의 매 maintenance burden 의 80%. #### Knowledge Map - 매 author 의 dominate. - 매 bus factor. #### Coupling - 매 file 의 매 같이 변경 의 frequency. - 매 logical coupling. ### 매 limitation - 매 ≥6 month git history 필요. - 매 stale code (no churn) 의 weak signal. - 매 enterprise pricing. - 매 learning curve. ### 매 alternative #### Open-source / DIY - 매 git log + complexity tool. - 매 lizard, scc. #### SaaS competitor - **SonarQube + history**. - **Code Climate Velocity**. - **Sourcegraph**. - **Pluralsight Flow**. - **LinearB / Axify** (DORA-focused). ### 매 응용 1. **Refactor priority**: 매 hotspot 의 first. 2. **Onboarding**: 매 high-churn area 의 explain. 3. **Architecture review**: 매 coupling 의 inspect. 4. **Risk forecast**: 매 incident-prone area. 5. **Bus factor**: 매 lone owner area. ## 💻 패턴 ### Hotspot detection (DIY) ```python import subprocess from collections import Counter import lizard # 매 complexity def hotspots(repo, since='6 months ago', top_n=20): # 매 1. churn (commit count per file) log = subprocess.check_output( f'git -C {repo} log --since="{since}" --name-only --pretty=format:', shell=True, text=True, ) churn = Counter(f for f in log.strip().split('\n') if f and f.endswith(('.py', '.ts', '.js'))) # 매 2. complexity (cyclomatic) complexity = {} for path, count in churn.items(): try: ll = lizard.analyze_file(f'{repo}/{path}') complexity[path] = sum(f.cyclomatic_complexity for f in ll.function_list) except: pass # 매 3. hotspot = churn × complexity scored = [(f, churn[f] * complexity.get(f, 0)) for f in churn] return sorted(scored, key=lambda x: -x[1])[:top_n] ``` ### Knowledge map (bus factor) ```python def knowledge_map(repo, since='1 year ago'): log = subprocess.check_output( f'git -C {repo} log --since="{since}" --pretty=format:%an --name-only', shell=True, text=True, ) file_authors = {} # 매 file → Counter(author → lines) current_author = None for line in log.split('\n'): if line and not line.endswith(('.py', '.ts', '.js')): current_author = line elif line: file_authors.setdefault(line, Counter())[current_author] += 1 # 매 bus factor: 매 1 author 의 80%+ 의 file risk_files = [] for f, authors in file_authors.items(): total = sum(authors.values()) top = authors.most_common(1)[0] if top[1] / total > 0.8: risk_files.append((f, top[0], top[1] / total)) return risk_files ``` ### Logical coupling (changed-together) ```python def logical_coupling(repo, since='6 months ago'): log = subprocess.check_output( f'git -C {repo} log --since="{since}" --pretty=format:COMMIT --name-only', shell=True, text=True, ) commits = log.split('COMMIT') coupling = Counter() for c in commits: files = [l.strip() for l in c.split('\n') if l.strip().endswith(('.py', '.ts'))] for i, a in enumerate(files): for b in files[i+1:]: coupling[tuple(sorted([a, b]))] += 1 return sorted(coupling.items(), key=lambda x: -x[1])[:20] ``` ### Code health score (proxy) ```python def health_score(file_path): """매 simple proxy of CodeScene-style.""" score = 10 ll = lizard.analyze_file(file_path) avg_cc = np.mean([f.cyclomatic_complexity for f in ll.function_list]) if avg_cc > 10: score -= 2 if avg_cc > 20: score -= 2 line_count = ll.nloc if line_count > 500: score -= 1 if line_count > 1000: score -= 2 longest_func = max((f.length for f in ll.function_list), default=0) if longest_func > 50: score -= 1 if longest_func > 100: score -= 2 return max(1, score) ``` ### CI integration (CodeScene API) ```yaml # .github/workflows/codescene.yml - name: CodeScene Delta Analysis uses: empear-analytics/codescene-pr-check@v1 with: api-url: ${{ secrets.CODESCENE_API_URL }} api-user: ${{ secrets.CODESCENE_USER }} api-token: ${{ secrets.CODESCENE_TOKEN }} project-id: 'my-project' fail-on-decline: true # 매 health 의 drop 시 의 fail ``` ### Quality gate ```python def quality_gate(pr_files): """매 PR 의 health 의 6 미만 시 의 fail.""" failures = [] for f in pr_files: score = health_score(f) if score < 6: failures.append((f, score)) if failures: return f'Quality gate failed: {failures}' return 'OK' ``` ### Refactor priority dashboard ```python def refactor_dashboard(repo): return { 'hotspots': hotspots(repo)[:10], 'bus_factor_risks': knowledge_map(repo), 'high_coupling': logical_coupling(repo)[:5], 'health_distribution': { 'critical': count_files_below(repo, 4), 'concerning': count_files_in_range(repo, 4, 6), 'ok': count_files_above(repo, 6), }, } ``` ## 🤔 결정 기준 | 상황 | Tool | |---|---| | Enterprise + budget | CodeScene SaaS | | OSS / DIY | git script + lizard + custom dashboard | | DORA + general | LinearB / Axify | | Code search + ownership | Sourcegraph | | Quality gate (PR) | SonarQube + custom | | Bus factor only | git blame + script | **기본값**: 매 small / OSS = 매 DIY. 매 enterprise = CodeScene 또는 LinearB. ## 🔗 Graph - 부모: [[Code-Quality]] · [[Refactoring]] · [[Engineering-Productivity]] - 변형: [[Behavioral-Code-Analysis]] · [[Hotspot-Detection]] · [[Code-Health]] - 응용: [[Bus-Factor]] · [[Logical-Coupling]] · [[Quality-Gate]] - Adjacent: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[SonarQube]] · [[Architecture-Anti-Patterns]] ## 🤖 LLM 활용 **언제**: 매 refactor priority. 매 onboarding. 매 architecture review. 매 incident prevention. **언제 X**: 매 < 6 month repo. 매 stale codebase analysis. ## ❌ 안티패턴 - **모든 hotspot 의 fix**: 매 priority 의 use. - **Static analysis 의 ignore**: 매 churn-less issue 의 miss. - **Git history 의 trust 100%**: 매 squash / rebase 의 noise. - **Bus factor 의 ignore**: 매 critical risk. - **No quality gate**: 매 silent decline. ## 🧪 검증 / 중복 - Verified (Adam Tornhill "Your Code as a Crime Scene", CodeScene docs). - 신뢰도 B. - Related: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[Architecture-Anti-Patterns]] · [[Asset-Specific-Knowledge]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — behavioral analysis + hotspot + 매 DIY git script + CI integration |