Files
2nd/10_Wiki/Topics/AI_and_ML/CodeScene.md
T
2026-05-10 22:08:15 +09:00

257 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-codescene
title: CodeScene (Behavioral Code Analysis)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [CodeScene, behavioral code analysis, hotspot detection, code health, technical debt analytics]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: applied
tags: [code-quality, codescene, behavioral-analysis, hotspot, technical-debt, refactoring, git-history]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: SaaS / on-prem
framework: CodeScene
---
# CodeScene
## 매 한 줄
> **"매 git history 의 mining 으로 매 hotspot 의 detect"**. 매 정적 분석 X — 매 churn × complexity 의 교차점. 매 refactor priority 의 data-driven. 매 modern 의 alternative: SonarQube + 매 custom git script, Code Climate, Sourcegraph.
## 매 핵심
### 매 차별점 (vs 매 SAST)
- 매 정적 X — 매 behavioral.
- 매 git history 의 6 month+ 필수.
- 매 churn (변경 빈도) × 매 complexity 의 hotspot.
- 매 author pattern.
### 매 핵심 metric
#### Code Health (1-10)
- 매 defect risk + delivery speed + predictability 의 결합.
- 매 6 미만 의 alert.
#### Hotspot
- 매 churn × complexity.
- 매 top 5% 의 매 maintenance burden 의 80%.
#### Knowledge Map
- 매 author 의 dominate.
- 매 bus factor.
#### Coupling
- 매 file 의 매 같이 변경 의 frequency.
- 매 logical coupling.
### 매 limitation
- 매 ≥6 month git history 필요.
- 매 stale code (no churn) 의 weak signal.
- 매 enterprise pricing.
- 매 learning curve.
### 매 alternative
#### Open-source / DIY
- 매 git log + complexity tool.
- 매 lizard, scc.
#### SaaS competitor
- **SonarQube + history**.
- **Code Climate Velocity**.
- **Sourcegraph**.
- **Pluralsight Flow**.
- **LinearB / Axify** (DORA-focused).
### 매 응용
1. **Refactor priority**: 매 hotspot 의 first.
2. **Onboarding**: 매 high-churn area 의 explain.
3. **Architecture review**: 매 coupling 의 inspect.
4. **Risk forecast**: 매 incident-prone area.
5. **Bus factor**: 매 lone owner area.
## 💻 패턴
### Hotspot detection (DIY)
```python
import subprocess
from collections import Counter
import lizard # 매 complexity
def hotspots(repo, since='6 months ago', top_n=20):
# 매 1. churn (commit count per file)
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --name-only --pretty=format:',
shell=True, text=True,
)
churn = Counter(f for f in log.strip().split('\n') if f and f.endswith(('.py', '.ts', '.js')))
# 매 2. complexity (cyclomatic)
complexity = {}
for path, count in churn.items():
try:
ll = lizard.analyze_file(f'{repo}/{path}')
complexity[path] = sum(f.cyclomatic_complexity for f in ll.function_list)
except: pass
# 매 3. hotspot = churn × complexity
scored = [(f, churn[f] * complexity.get(f, 0)) for f in churn]
return sorted(scored, key=lambda x: -x[1])[:top_n]
```
### Knowledge map (bus factor)
```python
def knowledge_map(repo, since='1 year ago'):
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --pretty=format:%an --name-only',
shell=True, text=True,
)
file_authors = {} # 매 file → Counter(author → lines)
current_author = None
for line in log.split('\n'):
if line and not line.endswith(('.py', '.ts', '.js')):
current_author = line
elif line:
file_authors.setdefault(line, Counter())[current_author] += 1
# 매 bus factor: 매 1 author 의 80%+ 의 file
risk_files = []
for f, authors in file_authors.items():
total = sum(authors.values())
top = authors.most_common(1)[0]
if top[1] / total > 0.8:
risk_files.append((f, top[0], top[1] / total))
return risk_files
```
### Logical coupling (changed-together)
```python
def logical_coupling(repo, since='6 months ago'):
log = subprocess.check_output(
f'git -C {repo} log --since="{since}" --pretty=format:COMMIT --name-only',
shell=True, text=True,
)
commits = log.split('COMMIT')
coupling = Counter()
for c in commits:
files = [l.strip() for l in c.split('\n') if l.strip().endswith(('.py', '.ts'))]
for i, a in enumerate(files):
for b in files[i+1:]:
coupling[tuple(sorted([a, b]))] += 1
return sorted(coupling.items(), key=lambda x: -x[1])[:20]
```
### Code health score (proxy)
```python
def health_score(file_path):
"""매 simple proxy of CodeScene-style."""
score = 10
ll = lizard.analyze_file(file_path)
avg_cc = np.mean([f.cyclomatic_complexity for f in ll.function_list])
if avg_cc > 10: score -= 2
if avg_cc > 20: score -= 2
line_count = ll.nloc
if line_count > 500: score -= 1
if line_count > 1000: score -= 2
longest_func = max((f.length for f in ll.function_list), default=0)
if longest_func > 50: score -= 1
if longest_func > 100: score -= 2
return max(1, score)
```
### CI integration (CodeScene API)
```yaml
# .github/workflows/codescene.yml
- name: CodeScene Delta Analysis
uses: empear-analytics/codescene-pr-check@v1
with:
api-url: ${{ secrets.CODESCENE_API_URL }}
api-user: ${{ secrets.CODESCENE_USER }}
api-token: ${{ secrets.CODESCENE_TOKEN }}
project-id: 'my-project'
fail-on-decline: true # 매 health 의 drop 시 의 fail
```
### Quality gate
```python
def quality_gate(pr_files):
"""매 PR 의 health 의 6 미만 시 의 fail."""
failures = []
for f in pr_files:
score = health_score(f)
if score < 6:
failures.append((f, score))
if failures:
return f'Quality gate failed: {failures}'
return 'OK'
```
### Refactor priority dashboard
```python
def refactor_dashboard(repo):
return {
'hotspots': hotspots(repo)[:10],
'bus_factor_risks': knowledge_map(repo),
'high_coupling': logical_coupling(repo)[:5],
'health_distribution': {
'critical': count_files_below(repo, 4),
'concerning': count_files_in_range(repo, 4, 6),
'ok': count_files_above(repo, 6),
},
}
```
## 🤔 결정 기준
| 상황 | Tool |
|---|---|
| Enterprise + budget | CodeScene SaaS |
| OSS / DIY | git script + lizard + custom dashboard |
| DORA + general | LinearB / Axify |
| Code search + ownership | Sourcegraph |
| Quality gate (PR) | SonarQube + custom |
| Bus factor only | git blame + script |
**기본값**: 매 small / OSS = 매 DIY. 매 enterprise = CodeScene 또는 LinearB.
## 🔗 Graph
- 부모: [[Code-Quality]] · [[Refactoring]] · [[Engineering-Productivity]]
- 변형: [[Behavioral-Code-Analysis]] · [[Hotspot-Detection]] · [[Code-Health]]
- 응용: [[Bus-Factor]] · [[Logical-Coupling]] · [[Quality-Gate]]
- Adjacent: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[SonarQube]] · [[Architecture-Anti-Patterns]]
## 🤖 LLM 활용
**언제**: 매 refactor priority. 매 onboarding. 매 architecture review. 매 incident prevention.
**언제 X**: 매 < 6 month repo. 매 stale codebase analysis.
## ❌ 안티패턴
- **모든 hotspot 의 fix**: 매 priority 의 use.
- **Static analysis 의 ignore**: 매 churn-less issue 의 miss.
- **Git history 의 trust 100%**: 매 squash / rebase 의 noise.
- **Bus factor 의 ignore**: 매 critical risk.
- **No quality gate**: 매 silent decline.
## 🧪 검증 / 중복
- Verified (Adam Tornhill "Your Code as a Crime Scene", CodeScene docs).
- 신뢰도 B.
- Related: [[Code_Smells]] · [[Quality_Code_Review_Modern]] · [[Axify]] · [[Architecture-Anti-Patterns]] · [[Asset-Specific-Knowledge]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — behavioral analysis + hotspot + 매 DIY git script + CI integration |