Files
2nd/10_Wiki/Topics/AI_and_ML/CodeScene.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

257 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-codescene
title: CodeScene (Behavioral Code Analysis)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [CodeScene, behavioral code analysis, hotspot detection, code health, technical debt analytics]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: applied
tags: [code-quality, codescene, behavioral-analysis, hotspot, technical-debt, refactoring, git-history]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: SaaS / on-prem
framework: CodeScene
---
# CodeScene
## 매 한 줄
> **"매 git history 의 mining 으로 매 hotspot 의 detect"**. 매 정적 분석 X — 매 churn × complexity 의 교차점. 매 refactor priority 의 data-driven. 매 modern 의 alternative: SonarQube + 매 custom git script, Code Climate, Sourcegraph.
## 매 핵심
### 매 차별점 (vs 매 SAST)
- 매 정적 X — 매 behavioral.
- 매 git history 의 6 month+ 필수.
- 매 churn (변경 빈도) × 매 complexity 의 hotspot.
- 매 author pattern.
### 매 핵심 metric
#### Code Health (1-10)
- 매 defect risk + delivery speed + predictability 의 결합.
- 매 6 미만 의 alert.
#### Hotspot
- 매 churn × complexity.
- 매 top 5% 의 매 maintenance burden 의 80%.
#### Knowledge Map
- 매 author 의 dominate.
- 매 bus factor.
#### Coupling
- 매 file 의 매 같이 변경 의 frequency.
- 매 logical coupling.
### 매 limitation
- 매 ≥6 month git history 필요.
- 매 stale code (no churn) 의 weak signal.
- 매 enterprise pricing.
- 매 learning curve.
### 매 alternative
#### Open-source / DIY
- 매 git log + complexity tool.
- 매 lizard, scc.
#### SaaS competitor
- **SonarQube + history**.
- **Code Climate Velocity**.
- **Sourcegraph**.
- **Pluralsight Flow**.
- **LinearB / Axify** (DORA-focused).
### 매 응용
1. **Refactor priority**: 매 hotspot 의 first.
2. **Onboarding**: 매 high-churn area 의 explain.
3. **Architecture review**: 매 coupling 의 inspect.
4. **Risk forecast**: 매 incident-prone area.
5. **Bus factor**: 매 lone owner area.
## 💻 패턴
### Hotspot detection (DIY)
```python
import subprocess
from collections import Counter
import lizard # 매 complexity
def hotspots(repo, since='6 months ago', top_n=20):
# 매 1. churn (commit count per file)
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --name-only --pretty=format:',
shell=True, text=True,
)
churn = Counter(f for f in log.strip().split('\n') if f and f.endswith(('.py', '.ts', '.js')))
# 매 2. complexity (cyclomatic)
complexity = {}
for path, count in churn.items():
try:
ll = lizard.analyze_file(f'{repo}/{path}')
complexity[path] = sum(f.cyclomatic_complexity for f in ll.function_list)
except: pass
# 매 3. hotspot = churn × complexity
scored = [(f, churn[f] * complexity.get(f, 0)) for f in churn]
return sorted(scored, key=lambda x: -x[1])[:top_n]
```
### Knowledge map (bus factor)
```python
def knowledge_map(repo, since='1 year ago'):
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --pretty=format:%an --name-only',
shell=True, text=True,
)
file_authors = {} # 매 file → Counter(author → lines)
current_author = None
for line in log.split('\n'):
if line and not line.endswith(('.py', '.ts', '.js')):
current_author = line
elif line:
file_authors.setdefault(line, Counter())[current_author] += 1
# 매 bus factor: 매 1 author 의 80%+ 의 file
risk_files = []
for f, authors in file_authors.items():
total = sum(authors.values())
top = authors.most_common(1)[0]
if top[1] / total > 0.8:
risk_files.append((f, top[0], top[1] / total))
return risk_files
```
### Logical coupling (changed-together)
```python
def logical_coupling(repo, since='6 months ago'):
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --pretty=format:COMMIT --name-only',
shell=True, text=True,
)
commits = log.split('COMMIT')
coupling = Counter()
for c in commits:
files = [l.strip() for l in c.split('\n') if l.strip().endswith(('.py', '.ts'))]
for i, a in enumerate(files):
for b in files[i+1:]:
coupling[tuple(sorted([a, b]))] += 1
return sorted(coupling.items(), key=lambda x: -x[1])[:20]
```
### Code health score (proxy)
```python
def health_score(file_path):
"""매 simple proxy of CodeScene-style."""
score = 10
ll = lizard.analyze_file(file_path)
avg_cc = np.mean([f.cyclomatic_complexity for f in ll.function_list])
if avg_cc > 10: score -= 2
if avg_cc > 20: score -= 2
line_count = ll.nloc
if line_count > 500: score -= 1
if line_count > 1000: score -= 2
longest_func = max((f.length for f in ll.function_list), default=0)
if longest_func > 50: score -= 1
if longest_func > 100: score -= 2
return max(1, score)
```
### CI integration (CodeScene API)
```yaml
# .github/workflows/codescene.yml
- name: CodeScene Delta Analysis
uses: empear-analytics/codescene-pr-check@v1
with:
api-url: ${{ secrets.CODESCENE_API_URL }}
api-user: ${{ secrets.CODESCENE_USER }}
api-token: ${{ secrets.CODESCENE_TOKEN }}
project-id: 'my-project'
fail-on-decline: true # 매 health 의 drop 시 의 fail
```
### Quality gate
```python
def quality_gate(pr_files):
"""매 PR 의 health 의 6 미만 시 의 fail."""
failures = []
for f in pr_files:
score = health_score(f)
if score < 6:
failures.append((f, score))
if failures:
return f'Quality gate failed: {failures}'
return 'OK'
```
### Refactor priority dashboard
```python
def refactor_dashboard(repo):
return {
'hotspots': hotspots(repo)[:10],
'bus_factor_risks': knowledge_map(repo),
'high_coupling': logical_coupling(repo)[:5],
'health_distribution': {
'critical': count_files_below(repo, 4),
'concerning': count_files_in_range(repo, 4, 6),
'ok': count_files_above(repo, 6),
},
}
```
## 🤔 결정 기준
| 상황 | Tool |
|---|---|
| Enterprise + budget | CodeScene SaaS |
| OSS / DIY | git script + lizard + custom dashboard |
| DORA + general | LinearB / Axify |
| Code search + ownership | Sourcegraph |
| Quality gate (PR) | SonarQube + custom |
| Bus factor only | git blame + script |
**기본값**: 매 small / OSS = 매 DIY. 매 enterprise = CodeScene 또는 LinearB.
## 🔗 Graph
- 부모: [[Code-Quality]] · [[Refactoring_Best_Practices|Refactoring]]
- 변형: [[Behavioral-Code-Analysis]] · [[Hotspot-Detection]] · [[Code-Health]]
- 응용: [[Quality-Gate]]
- Adjacent: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[SonarQube]] · [[Architecture Anti-patterns]]
## 🤖 LLM 활용
**언제**: 매 refactor priority. 매 onboarding. 매 architecture review. 매 incident prevention.
**언제 X**: 매 < 6 month repo. 매 stale codebase analysis.
## ❌ 안티패턴
- **모든 hotspot 의 fix**: 매 priority 의 use.
- **Static analysis 의 ignore**: 매 churn-less issue 의 miss.
- **Git history 의 trust 100%**: 매 squash / rebase 의 noise.
- **Bus factor 의 ignore**: 매 critical risk.
- **No quality gate**: 매 silent decline.
## 🧪 검증 / 중복
- Verified (Adam Tornhill "Your Code as a Crime Scene", CodeScene docs).
- 신뢰도 B.
- Related: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[Architecture Anti-patterns]] · [[Asset-Specific-Knowledge]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — behavioral analysis + hotspot + 매 DIY git script + CI integration |