d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
528 lines
15 KiB
Markdown
528 lines
15 KiB
Markdown
---
|
||
id: wiki-2026-0508-ai-기반-코드-분석-도구-ai-powered-code-a
|
||
title: AI-Powered Code Analysis Tools
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [AI 기반 코드 분석 도구, AI code analyzer, SAST AI, code analysis platform, codebase RAG]
|
||
duplicate_of: none
|
||
source_trust_level: B
|
||
confidence_score: 0.85
|
||
verification_status: conceptual
|
||
tags: [ai-code-analysis, sast, security, code-review, mcp, codebase-rag, devsecops, technical-debt]
|
||
raw_sources: [Datacollector_MAC/out_wiki/AI 기반 코드 분석 도구]
|
||
last_reinforced: 2026-05-09
|
||
github_commit: pending
|
||
inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
|
||
tech_stack:
|
||
language: TS / Python / Rust
|
||
framework: GitHub Actions / Sonar / Snyk / CodeRabbit / Greptile / Cursor / MCP
|
||
---
|
||
|
||
# AI-Powered Code Analysis Tools
|
||
|
||
## 📌 한 줄 통찰 (The Karpathy Summary)
|
||
> **LLM + AST + codebase RAG 의 매 file 의 deep context 분석**. SAST + behavioral analysis + cross-repository. **CodeRabbit (PR), Greptile (큰 codebase), Cursor / Claude Code (IDE), Sonar / Snyk (enterprise)**. 매 organization 의 hybrid stack.
|
||
|
||
## 📖 구조화된 지식 (Synthesized Content)
|
||
|
||
### 매 capability layer
|
||
|
||
#### 1. Static analysis (AST)
|
||
- 매 file 의 syntax tree.
|
||
- 매 rule (ESLint, Pylint, clippy).
|
||
- 매 type-check.
|
||
- Cyclomatic complexity.
|
||
|
||
#### 2. Semantic analysis (LLM)
|
||
- 매 intent / context.
|
||
- 매 ambiguity.
|
||
- 매 idiom.
|
||
- 매 architectural pattern.
|
||
|
||
#### 3. Cross-file analysis
|
||
- 매 dependency graph.
|
||
- 매 import / export.
|
||
- 매 call graph.
|
||
- Code Property Graph (CPG).
|
||
|
||
#### 4. Cross-repository (modern)
|
||
- 매 microservice 의 contract.
|
||
- 매 API consumer.
|
||
- 매 shared library 의 impact.
|
||
|
||
#### 5. Behavioral analysis
|
||
- 매 git history.
|
||
- 매 hotspot (frequent change).
|
||
- 매 author concentration.
|
||
- 매 technical debt.
|
||
|
||
### 매 tool family
|
||
|
||
#### PR review (LLM-based)
|
||
| Tool | 강점 |
|
||
|---|---|
|
||
| **CodeRabbit** | 매 PR 의 summary + comment |
|
||
| **Greptile** | 큰 codebase 의 context |
|
||
| **Sourcery** | 매 commit 의 refactor |
|
||
| **Qodo** (옛 Codium) | Test generation |
|
||
| **Bito** | 매 PR 의 review |
|
||
| **Korbit** | DevSecOps focus |
|
||
|
||
#### IDE assist
|
||
| Tool | 강점 |
|
||
|---|---|
|
||
| **Cursor** | AI-native IDE |
|
||
| **Claude Code** | Terminal CLI |
|
||
| **GitHub Copilot** | Most popular autocomplete |
|
||
| **Continue.dev** | Open source IDE plugin |
|
||
| **Tabnine** | Privacy / on-prem option |
|
||
| **Cody (Sourcegraph)** | 매 codebase 의 graph |
|
||
| **Aider** | Git-aware CLI |
|
||
|
||
#### Static + AI hybrid
|
||
| Tool | 강점 |
|
||
|---|---|
|
||
| **SonarQube + Sonar AI** | Enterprise SAST + AI |
|
||
| **Snyk Code** | Security + AI fix |
|
||
| **Semgrep** | Pattern-based + AI |
|
||
| **Veracode** | Enterprise security |
|
||
| **Checkmarx** | Enterprise SAST |
|
||
| **Corgea** | AI auto-fix focus |
|
||
| **GitHub Advanced Security** | CodeQL + AI |
|
||
|
||
#### Codebase intelligence
|
||
| Tool | 강점 |
|
||
|---|---|
|
||
| **Sourcegraph** | Code search + graph |
|
||
| **Greptile** | LLM + codebase RAG |
|
||
| **Kodesage** | Legacy + Jira + DB integration |
|
||
| **Qodana** (JetBrains) | IDE-integrated |
|
||
| **CodeScene** | Behavioral analysis |
|
||
| **GitLoop** | Code Q&A bot |
|
||
|
||
### 매 modern technique
|
||
|
||
#### MCP (Model Context Protocol)
|
||
- 매 standardized protocol (Anthropic).
|
||
- 매 LLM 의 GitHub / file system / external tool 의 access.
|
||
- 매 Cursor, Claude Desktop, Cline 의 native.
|
||
|
||
#### Codebase RAG
|
||
- 매 file / function 의 embedding.
|
||
- 매 query → top-K retrieval.
|
||
- 매 LLM 의 context.
|
||
|
||
#### Code Property Graph (CPG)
|
||
- AST + control flow + data flow + 매 graph.
|
||
- 매 security analysis 의 superior.
|
||
- Joern / Atom 의 example.
|
||
|
||
#### Taint analysis
|
||
- 매 user input → tainted.
|
||
- 매 sensitive operation 의 reach.
|
||
- 매 SQL injection / XSS / SSRF detect.
|
||
|
||
#### Auto-fix (LLM-generated)
|
||
- 매 vulnerability 의 patch.
|
||
- 매 confidence score.
|
||
- 매 human review (high-stakes).
|
||
|
||
### 매 deployment model
|
||
|
||
#### SaaS
|
||
- 매 vendor cloud.
|
||
- 매 quick start.
|
||
- 매 IP / privacy concern.
|
||
|
||
#### On-premise
|
||
- 매 self-host.
|
||
- 매 enterprise / regulated.
|
||
- Sonar / Snyk / Veracode 가 지원.
|
||
|
||
#### Air-gapped
|
||
- 매 government / defense.
|
||
- 매 internal LLM 의 fine-tune.
|
||
- Qodo, Kodesage, Fortify.
|
||
|
||
### 매 organizational pattern
|
||
|
||
#### Layer 1: IDE (real-time)
|
||
- 매 dev 의 Cursor / Copilot.
|
||
- 매 keystroke 의 feedback.
|
||
|
||
#### Layer 2: Pre-commit (local)
|
||
- 매 husky + lint-staged.
|
||
- 매 ESLint, Prettier, type check.
|
||
|
||
#### Layer 3: CI / PR (automated)
|
||
- 매 GitHub Actions / GitLab CI.
|
||
- 매 CodeRabbit / Greptile.
|
||
- 매 SAST (Snyk, Sonar).
|
||
|
||
#### Layer 4: Periodic deep scan
|
||
- 매 weekly / monthly.
|
||
- 매 codebase-wide.
|
||
- 매 dependency vulnerability.
|
||
|
||
### 매 limitation
|
||
|
||
#### Context window
|
||
- 큰 PR (50+ file) 의 quality ↓.
|
||
- 큰 monorepo 의 hard.
|
||
|
||
#### False positive
|
||
- Alert fatigue.
|
||
- Manual tuning.
|
||
|
||
#### AI hallucination
|
||
- 매 niche framework.
|
||
- 매 wrong fix.
|
||
- LLM-as-judge 의 partial fix.
|
||
|
||
#### Privacy / IP
|
||
- 매 cloud AI 의 code 의 vendor.
|
||
- 매 enterprise 의 self-host requirement.
|
||
|
||
#### Cost
|
||
- LLM API call.
|
||
- Compute (RAG indexing).
|
||
- Vendor licensing.
|
||
|
||
### 매 ROI metric
|
||
|
||
#### DORA
|
||
- Lead time.
|
||
- Deployment frequency.
|
||
- Change failure rate.
|
||
- MTTR.
|
||
|
||
#### Tool-specific
|
||
- AI suggestion accept rate.
|
||
- False positive rate.
|
||
- 매 PR review time.
|
||
- 매 security finding.
|
||
|
||
#### 매 caveat (Goodhart)
|
||
- 매 metric 의 game-able.
|
||
- 매 outcome ≠ 매 tool adoption.
|
||
|
||
## 💻 코드 패턴 (Code Patterns)
|
||
|
||
### CodeRabbit setup
|
||
```yaml
|
||
# .coderabbit.yaml
|
||
language: en
|
||
reviews:
|
||
profile: chill
|
||
high_level_summary: true
|
||
request_changes_workflow: false
|
||
|
||
path_filters:
|
||
- '!**/dist/**'
|
||
- '!**/*.lock'
|
||
|
||
auto_review:
|
||
enabled: true
|
||
drafts: false
|
||
|
||
chat:
|
||
auto_reply: true
|
||
```
|
||
|
||
### Greptile (codebase RAG)
|
||
```bash
|
||
# Index codebase
|
||
greptile index https://github.com/org/repo
|
||
|
||
# Query
|
||
greptile ask "Where is user authentication implemented?"
|
||
```
|
||
|
||
### Cursor (IDE config)
|
||
```json
|
||
// .cursor/rules
|
||
{
|
||
"rules": [
|
||
"Prefer functional components.",
|
||
"Use TypeScript strict mode.",
|
||
"No new dependencies without approval."
|
||
]
|
||
}
|
||
```
|
||
|
||
### Custom Semgrep rule
|
||
```yaml
|
||
rules:
|
||
- id: ai-prompt-injection
|
||
pattern-either:
|
||
- pattern: |
|
||
$LLM.complete(... + $USER_INPUT + ...)
|
||
- pattern: |
|
||
$LLM.complete(`...${$USER_INPUT}...`)
|
||
message: |
|
||
Prompt injection risk: user input concatenated into LLM prompt.
|
||
Use parameterized template or input validation.
|
||
severity: ERROR
|
||
languages: [python, javascript, typescript]
|
||
```
|
||
|
||
### MCP server (custom analysis tool)
|
||
```typescript
|
||
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
|
||
|
||
const server = new Server({ name: 'code-analyzer', version: '1.0.0' });
|
||
|
||
server.setRequestHandler(ListToolsRequestSchema, () => ({
|
||
tools: [
|
||
{
|
||
name: 'find_security_issue',
|
||
description: 'Scan code for security issue',
|
||
inputSchema: {
|
||
type: 'object',
|
||
properties: { file: { type: 'string' } },
|
||
required: ['file']
|
||
}
|
||
}
|
||
]
|
||
}));
|
||
|
||
server.setRequestHandler(CallToolRequestSchema, async (req) => {
|
||
if (req.params.name === 'find_security_issue') {
|
||
const issues = await scanSecurity(req.params.arguments.file);
|
||
return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
|
||
}
|
||
});
|
||
```
|
||
|
||
### Codebase RAG (custom)
|
||
```python
|
||
from sentence_transformers import SentenceTransformer
|
||
import lancedb
|
||
|
||
model = SentenceTransformer('all-MiniLM-L6-v2')
|
||
|
||
def index_codebase(repo_path: str):
|
||
db = lancedb.connect("./codebase.db")
|
||
chunks = []
|
||
|
||
for file in walk_python_files(repo_path):
|
||
for func in extract_functions(file):
|
||
embedding = model.encode(func.body)
|
||
chunks.append({
|
||
"file": file,
|
||
"function": func.name,
|
||
"code": func.body,
|
||
"embedding": embedding,
|
||
})
|
||
|
||
db.create_table("code", data=chunks)
|
||
|
||
def query(question: str, k: int = 5):
|
||
db = lancedb.connect("./codebase.db")
|
||
table = db.open_table("code")
|
||
|
||
q_emb = model.encode(question)
|
||
results = table.search(q_emb).limit(k).to_list()
|
||
return results
|
||
```
|
||
|
||
### Auto-fix (with confidence gate)
|
||
```python
|
||
def auto_fix_pr(pr, suggestions):
|
||
for s in suggestions:
|
||
if s.confidence < 0.95:
|
||
post_comment(pr, s.file, s.line, s.suggestion) # human review
|
||
continue
|
||
|
||
if s.is_high_stakes: # security, business-critical
|
||
post_comment(pr, s.file, s.line, s.suggestion + ' (review needed)')
|
||
continue
|
||
|
||
# Auto-apply
|
||
apply_fix(s.file, s.line, s.replacement)
|
||
commit_message = f"AI auto-fix: {s.summary}\n\nSeverity: {s.severity}\nConfidence: {s.confidence}"
|
||
commit(commit_message, author='bot')
|
||
```
|
||
|
||
### Behavioral hotspot detection
|
||
```python
|
||
import git
|
||
|
||
def find_hotspots(repo_path: str):
|
||
repo = git.Repo(repo_path)
|
||
|
||
# 매 file 의 commit count
|
||
file_changes = defaultdict(int)
|
||
for commit in repo.iter_commits('main', max_count=1000):
|
||
for file in commit.stats.files:
|
||
file_changes[file] += 1
|
||
|
||
# 매 file 의 complexity
|
||
file_complexity = {}
|
||
for file in file_changes.keys():
|
||
file_complexity[file] = compute_cyclomatic_complexity(file)
|
||
|
||
# Hotspot = 매 high churn × high complexity
|
||
hotspots = [
|
||
{'file': f, 'churn': c, 'complexity': file_complexity.get(f, 0),
|
||
'hotspot_score': c * file_complexity.get(f, 0)}
|
||
for f, c in file_changes.items()
|
||
]
|
||
return sorted(hotspots, key=lambda x: -x['hotspot_score'])[:20]
|
||
```
|
||
|
||
### CI integration (multi-tool)
|
||
```yaml
|
||
# .github/workflows/code-quality.yml
|
||
on: [pull_request]
|
||
|
||
jobs:
|
||
quality:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
with: { fetch-depth: 0 }
|
||
|
||
# Static
|
||
- run: npm run lint
|
||
- run: npm run typecheck
|
||
|
||
# Security
|
||
- uses: snyk/actions/setup@master
|
||
- run: snyk code test
|
||
|
||
# AI review (CodeRabbit auto-runs)
|
||
|
||
# Test coverage
|
||
- run: npm test -- --coverage
|
||
- uses: codecov/codecov-action@v3
|
||
|
||
# SonarQube
|
||
- uses: SonarSource/sonarcloud-github-action@master
|
||
env:
|
||
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
|
||
```
|
||
|
||
### AI eval 의 quality
|
||
```python
|
||
# Manual sample
|
||
def eval_ai_review(num_samples=20):
|
||
samples = []
|
||
for pr in recent_prs(20):
|
||
ai_findings = ai_review(pr)
|
||
human_review = get_human_review(pr)
|
||
|
||
true_positive = len(set(ai_findings) & set(human_review.issues))
|
||
false_positive = len(set(ai_findings) - set(human_review.issues))
|
||
false_negative = len(set(human_review.issues) - set(ai_findings))
|
||
|
||
samples.append({
|
||
'pr': pr.id,
|
||
'precision': true_positive / max(len(ai_findings), 1),
|
||
'recall': true_positive / max(len(human_review.issues), 1),
|
||
})
|
||
|
||
return samples
|
||
```
|
||
|
||
### Custom rule per team
|
||
```yaml
|
||
# .team/rules/api-pattern.yaml
|
||
- id: prefer-tRPC-over-REST
|
||
pattern: |
|
||
fetch('/api/...')
|
||
message: |
|
||
This codebase uses tRPC. Prefer trpc.* over fetch.
|
||
severity: WARNING
|
||
```
|
||
|
||
### Auto-fix 의 PR-only scope
|
||
```ts
|
||
// 매 auto-fix 가 own PR (not 매 PR 의 mix)
|
||
async function processSuggestion(suggestion) {
|
||
const branch = `ai-fix/${suggestion.id}`;
|
||
await git.checkoutBranch(branch);
|
||
await applyFix(suggestion);
|
||
await git.commit(`AI auto-fix: ${suggestion.summary}`);
|
||
await git.push(branch);
|
||
|
||
await openPR({
|
||
title: `[AI Fix] ${suggestion.summary}`,
|
||
body: `Severity: ${suggestion.severity}\nConfidence: ${suggestion.confidence}\n\n${suggestion.explanation}`,
|
||
head: branch,
|
||
base: 'main',
|
||
});
|
||
}
|
||
```
|
||
|
||
## 🤔 의사결정 기준 (Decision Criteria)
|
||
|
||
| 상황 | 추천 stack |
|
||
|---|---|
|
||
| Small startup | Cursor + CodeRabbit |
|
||
| Mid-size | + Snyk Code |
|
||
| Enterprise | Sonar + Snyk + CodeRabbit + Cursor |
|
||
| Privacy / on-prem | Sonar self-host + ConnectAI / Continue.dev |
|
||
| Air-gapped | Qodo + internal LLM |
|
||
| Legacy / large monorepo | Greptile + Kodesage |
|
||
| Security-critical | Veracode + Snyk + Semgrep |
|
||
| Behavioral / debt | CodeScene |
|
||
|
||
**기본값**: Cursor (IDE) + CodeRabbit (PR) + Snyk (security). 매 layer 의 different tool.
|
||
|
||
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
|
||
- **Tool consolidation vs best-of-breed**: 매 tool 의 multiple = redundant overhead. 매 single 의 limit.
|
||
- **Cloud AI vs privacy**: 매 enterprise 의 self-host push.
|
||
- **Auto-fix 의 hallucination**: 매 production push 의 risk.
|
||
- **AI 의 false positive 의 fatigue**: 매 dev 의 dismiss.
|
||
- **Cost ↑**: 매 LLM API 의 매 PR 의 $.
|
||
- **DORA metric 의 unclear improvement**: 매 study 의 mixed evidence.
|
||
|
||
## 🔗 지식 연결 (Graph)
|
||
- 부모: [[AI_코드_리뷰]] · [[Static-Analysis]] · [[CI/CD Pipeline & IDE Security Integration|DevSecOps]]
|
||
- 변형: [[CodeRabbit]] · [[Greptile]] · [[Cursor]] · [[Sonar]]
|
||
- 응용: [[Codebase-RAG]] · [[Code Property Graph]]
|
||
- 기술: [[AST]] · [[Semgrep]] · [[CodeQL]] · [[Joern]]
|
||
- 응용: [[Behavioral-Code-Analysis]] · [[Technical_Debt|Technical-Debt]]
|
||
- Adjacent: [[Code Agent — Devin / Cursor / Claude Code]]
|
||
|
||
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
|
||
|
||
**언제 이 지식을 쓰는가:**
|
||
- 매 organization 의 code analysis tool 의 selection.
|
||
- 매 CI / PR workflow 의 design.
|
||
- 매 enterprise 의 SAST + AI 의 hybrid.
|
||
- 매 codebase RAG 의 build.
|
||
- 매 MCP server 의 작성.
|
||
|
||
**언제 쓰면 안 되는가:**
|
||
- Specific vendor 의 detailed comparison (changing).
|
||
- 매 specific compliance (SOC 2, etc.) 의 detailed (auditor).
|
||
- Very small project (overkill).
|
||
|
||
## ❌ 안티패턴 (Anti-Patterns)
|
||
- **Single tool 만**: 매 layer 의 gap.
|
||
- **모든 tool**: redundant + cost.
|
||
- **Auto-fix + no review**: hallucination 의 production.
|
||
- **Cloud AI + sensitive code**: IP leak.
|
||
- **No false positive feedback loop**: alert fatigue.
|
||
- **Tool 의 metric 의 game**: 매 outcome ≠ adoption.
|
||
- **Behavioral analysis 무시**: 매 hotspot 의 invisible.
|
||
|
||
## 🧪 검증 상태 (Validation)
|
||
- **정보 상태:** verified (concept-level).
|
||
- **출처 신뢰도:** B (vendor docs, GitHub Octoverse, Stanford CodeX research).
|
||
- **검토 이유:** Manual cleanup. 매 vendor / tool 의 매 6 month 의 evolution.
|
||
|
||
## 🧬 중복 검사 (Duplicate Check)
|
||
- **기존 유사 문서:** [[AI_코드_리뷰]] (related), [[AI_코드_리뷰]] (related), [[AI_Powered_Code_Analysis]] (similar — possibly duplicate).
|
||
- **처리 방식:** KEEP (focused on tool landscape).
|
||
- **처리 이유:** 매 tool 의 broader survey.
|
||
|
||
## 🕓 변경 이력 (Changelog)
|
||
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|
||
|------|-----------|-----------|--------|
|
||
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
|
||
| 2026-05-09 | Manual cleanup — capability layer + tool family + organizational pattern + code + 안티패턴 추가 | UPDATE | B |
|