Files
2nd/10_Wiki/Topics/AI_and_ML/AI 기반 코드 분석 도구 (AI-Powered Code Analysis Tools).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

528 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-ai-기반-코드-분석-도구-ai-powered-code-a
title: AI-Powered Code Analysis Tools
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [AI 기반 코드 분석 도구, AI code analyzer, SAST AI, code analysis platform, codebase RAG]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: conceptual
tags: [ai-code-analysis, sast, security, code-review, mcp, codebase-rag, devsecops, technical-debt]
raw_sources: [Datacollector_MAC/out_wiki/AI 기반 코드 분석 도구]
last_reinforced: 2026-05-09
github_commit: pending
inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09)
tech_stack:
language: TS / Python / Rust
framework: GitHub Actions / Sonar / Snyk / CodeRabbit / Greptile / Cursor / MCP
---
# AI-Powered Code Analysis Tools
## 📌 한 줄 통찰 (The Karpathy Summary)
> **LLM + AST + codebase RAG 의 매 file 의 deep context 분석**. SAST + behavioral analysis + cross-repository. **CodeRabbit (PR), Greptile (큰 codebase), Cursor / Claude Code (IDE), Sonar / Snyk (enterprise)**. 매 organization 의 hybrid stack.
## 📖 구조화된 지식 (Synthesized Content)
### 매 capability layer
#### 1. Static analysis (AST)
- 매 file 의 syntax tree.
- 매 rule (ESLint, Pylint, clippy).
- 매 type-check.
- Cyclomatic complexity.
#### 2. Semantic analysis (LLM)
- 매 intent / context.
- 매 ambiguity.
- 매 idiom.
- 매 architectural pattern.
#### 3. Cross-file analysis
- 매 dependency graph.
- 매 import / export.
- 매 call graph.
- Code Property Graph (CPG).
#### 4. Cross-repository (modern)
- 매 microservice 의 contract.
- 매 API consumer.
- 매 shared library 의 impact.
#### 5. Behavioral analysis
- 매 git history.
- 매 hotspot (frequent change).
- 매 author concentration.
- 매 technical debt.
### 매 tool family
#### PR review (LLM-based)
| Tool | 강점 |
|---|---|
| **CodeRabbit** | 매 PR 의 summary + comment |
| **Greptile** | 큰 codebase 의 context |
| **Sourcery** | 매 commit 의 refactor |
| **Qodo** (옛 Codium) | Test generation |
| **Bito** | 매 PR 의 review |
| **Korbit** | DevSecOps focus |
#### IDE assist
| Tool | 강점 |
|---|---|
| **Cursor** | AI-native IDE |
| **Claude Code** | Terminal CLI |
| **GitHub Copilot** | Most popular autocomplete |
| **Continue.dev** | Open source IDE plugin |
| **Tabnine** | Privacy / on-prem option |
| **Cody (Sourcegraph)** | 매 codebase 의 graph |
| **Aider** | Git-aware CLI |
#### Static + AI hybrid
| Tool | 강점 |
|---|---|
| **SonarQube + Sonar AI** | Enterprise SAST + AI |
| **Snyk Code** | Security + AI fix |
| **Semgrep** | Pattern-based + AI |
| **Veracode** | Enterprise security |
| **Checkmarx** | Enterprise SAST |
| **Corgea** | AI auto-fix focus |
| **GitHub Advanced Security** | CodeQL + AI |
#### Codebase intelligence
| Tool | 강점 |
|---|---|
| **Sourcegraph** | Code search + graph |
| **Greptile** | LLM + codebase RAG |
| **Kodesage** | Legacy + Jira + DB integration |
| **Qodana** (JetBrains) | IDE-integrated |
| **CodeScene** | Behavioral analysis |
| **GitLoop** | Code Q&A bot |
### 매 modern technique
#### MCP (Model Context Protocol)
- 매 standardized protocol (Anthropic).
- 매 LLM 의 GitHub / file system / external tool 의 access.
- 매 Cursor, Claude Desktop, Cline 의 native.
#### Codebase RAG
- 매 file / function 의 embedding.
- 매 query → top-K retrieval.
- 매 LLM 의 context.
#### Code Property Graph (CPG)
- AST + control flow + data flow + 매 graph.
- 매 security analysis 의 superior.
- Joern / Atom 의 example.
#### Taint analysis
- 매 user input → tainted.
- 매 sensitive operation 의 reach.
- 매 SQL injection / XSS / SSRF detect.
#### Auto-fix (LLM-generated)
- 매 vulnerability 의 patch.
- 매 confidence score.
- 매 human review (high-stakes).
### 매 deployment model
#### SaaS
- 매 vendor cloud.
- 매 quick start.
- 매 IP / privacy concern.
#### On-premise
- 매 self-host.
- 매 enterprise / regulated.
- Sonar / Snyk / Veracode 가 지원.
#### Air-gapped
- 매 government / defense.
- 매 internal LLM 의 fine-tune.
- Qodo, Kodesage, Fortify.
### 매 organizational pattern
#### Layer 1: IDE (real-time)
- 매 dev 의 Cursor / Copilot.
- 매 keystroke 의 feedback.
#### Layer 2: Pre-commit (local)
- 매 husky + lint-staged.
- 매 ESLint, Prettier, type check.
#### Layer 3: CI / PR (automated)
- 매 GitHub Actions / GitLab CI.
- 매 CodeRabbit / Greptile.
- 매 SAST (Snyk, Sonar).
#### Layer 4: Periodic deep scan
- 매 weekly / monthly.
- 매 codebase-wide.
- 매 dependency vulnerability.
### 매 limitation
#### Context window
- 큰 PR (50+ file) 의 quality ↓.
- 큰 monorepo 의 hard.
#### False positive
- Alert fatigue.
- Manual tuning.
#### AI hallucination
- 매 niche framework.
- 매 wrong fix.
- LLM-as-judge 의 partial fix.
#### Privacy / IP
- 매 cloud AI 의 code 의 vendor.
- 매 enterprise 의 self-host requirement.
#### Cost
- LLM API call.
- Compute (RAG indexing).
- Vendor licensing.
### 매 ROI metric
#### DORA
- Lead time.
- Deployment frequency.
- Change failure rate.
- MTTR.
#### Tool-specific
- AI suggestion accept rate.
- False positive rate.
- 매 PR review time.
- 매 security finding.
#### 매 caveat (Goodhart)
- 매 metric 의 game-able.
- 매 outcome ≠ 매 tool adoption.
## 💻 코드 패턴 (Code Patterns)
### CodeRabbit setup
```yaml
# .coderabbit.yaml
language: en
reviews:
profile: chill
high_level_summary: true
request_changes_workflow: false
path_filters:
- '!**/dist/**'
- '!**/*.lock'
auto_review:
enabled: true
drafts: false
chat:
auto_reply: true
```
### Greptile (codebase RAG)
```bash
# Index codebase
greptile index https://github.com/org/repo
# Query
greptile ask "Where is user authentication implemented?"
```
### Cursor (IDE config)
```json
// .cursor/rules
{
"rules": [
"Prefer functional components.",
"Use TypeScript strict mode.",
"No new dependencies without approval."
]
}
```
### Custom Semgrep rule
```yaml
rules:
- id: ai-prompt-injection
pattern-either:
- pattern: |
$LLM.complete(... + $USER_INPUT + ...)
- pattern: |
$LLM.complete(`...${$USER_INPUT}...`)
message: |
Prompt injection risk: user input concatenated into LLM prompt.
Use parameterized template or input validation.
severity: ERROR
languages: [python, javascript, typescript]
```
### MCP server (custom analysis tool)
```typescript
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
const server = new Server({ name: 'code-analyzer', version: '1.0.0' });
server.setRequestHandler(ListToolsRequestSchema, () => ({
tools: [
{
name: 'find_security_issue',
description: 'Scan code for security issue',
inputSchema: {
type: 'object',
properties: { file: { type: 'string' } },
required: ['file']
}
}
]
}));
server.setRequestHandler(CallToolRequestSchema, async (req) => {
if (req.params.name === 'find_security_issue') {
const issues = await scanSecurity(req.params.arguments.file);
return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
}
});
```
### Codebase RAG (custom)
```python
from sentence_transformers import SentenceTransformer
import lancedb
model = SentenceTransformer('all-MiniLM-L6-v2')
def index_codebase(repo_path: str):
db = lancedb.connect("./codebase.db")
chunks = []
for file in walk_python_files(repo_path):
for func in extract_functions(file):
embedding = model.encode(func.body)
chunks.append({
"file": file,
"function": func.name,
"code": func.body,
"embedding": embedding,
})
db.create_table("code", data=chunks)
def query(question: str, k: int = 5):
db = lancedb.connect("./codebase.db")
table = db.open_table("code")
q_emb = model.encode(question)
results = table.search(q_emb).limit(k).to_list()
return results
```
### Auto-fix (with confidence gate)
```python
def auto_fix_pr(pr, suggestions):
for s in suggestions:
if s.confidence < 0.95:
post_comment(pr, s.file, s.line, s.suggestion) # human review
continue
if s.is_high_stakes: # security, business-critical
post_comment(pr, s.file, s.line, s.suggestion + ' (review needed)')
continue
# Auto-apply
apply_fix(s.file, s.line, s.replacement)
commit_message = f"AI auto-fix: {s.summary}\n\nSeverity: {s.severity}\nConfidence: {s.confidence}"
commit(commit_message, author='bot')
```
### Behavioral hotspot detection
```python
import git
def find_hotspots(repo_path: str):
repo = git.Repo(repo_path)
# 매 file 의 commit count
file_changes = defaultdict(int)
for commit in repo.iter_commits('main', max_count=1000):
for file in commit.stats.files:
file_changes[file] += 1
# 매 file 의 complexity
file_complexity = {}
for file in file_changes.keys():
file_complexity[file] = compute_cyclomatic_complexity(file)
# Hotspot = 매 high churn × high complexity
hotspots = [
{'file': f, 'churn': c, 'complexity': file_complexity.get(f, 0),
'hotspot_score': c * file_complexity.get(f, 0)}
for f, c in file_changes.items()
]
return sorted(hotspots, key=lambda x: -x['hotspot_score'])[:20]
```
### CI integration (multi-tool)
```yaml
# .github/workflows/code-quality.yml
on: [pull_request]
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
# Static
- run: npm run lint
- run: npm run typecheck
# Security
- uses: snyk/actions/setup@master
- run: snyk code test
# AI review (CodeRabbit auto-runs)
# Test coverage
- run: npm test -- --coverage
- uses: codecov/codecov-action@v3
# SonarQube
- uses: SonarSource/sonarcloud-github-action@master
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
```
### AI eval 의 quality
```python
# Manual sample
def eval_ai_review(num_samples=20):
samples = []
for pr in recent_prs(20):
ai_findings = ai_review(pr)
human_review = get_human_review(pr)
true_positive = len(set(ai_findings) & set(human_review.issues))
false_positive = len(set(ai_findings) - set(human_review.issues))
false_negative = len(set(human_review.issues) - set(ai_findings))
samples.append({
'pr': pr.id,
'precision': true_positive / max(len(ai_findings), 1),
'recall': true_positive / max(len(human_review.issues), 1),
})
return samples
```
### Custom rule per team
```yaml
# .team/rules/api-pattern.yaml
- id: prefer-tRPC-over-REST
pattern: |
fetch('/api/...')
message: |
This codebase uses tRPC. Prefer trpc.* over fetch.
severity: WARNING
```
### Auto-fix 의 PR-only scope
```ts
// 매 auto-fix 가 own PR (not 매 PR 의 mix)
async function processSuggestion(suggestion) {
const branch = `ai-fix/${suggestion.id}`;
await git.checkoutBranch(branch);
await applyFix(suggestion);
await git.commit(`AI auto-fix: ${suggestion.summary}`);
await git.push(branch);
await openPR({
title: `[AI Fix] ${suggestion.summary}`,
body: `Severity: ${suggestion.severity}\nConfidence: ${suggestion.confidence}\n\n${suggestion.explanation}`,
head: branch,
base: 'main',
});
}
```
## 🤔 의사결정 기준 (Decision Criteria)
| 상황 | 추천 stack |
|---|---|
| Small startup | Cursor + CodeRabbit |
| Mid-size | + Snyk Code |
| Enterprise | Sonar + Snyk + CodeRabbit + Cursor |
| Privacy / on-prem | Sonar self-host + ConnectAI / Continue.dev |
| Air-gapped | Qodo + internal LLM |
| Legacy / large monorepo | Greptile + Kodesage |
| Security-critical | Veracode + Snyk + Semgrep |
| Behavioral / debt | CodeScene |
**기본값**: Cursor (IDE) + CodeRabbit (PR) + Snyk (security). 매 layer 의 different tool.
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **Tool consolidation vs best-of-breed**: 매 tool 의 multiple = redundant overhead. 매 single 의 limit.
- **Cloud AI vs privacy**: 매 enterprise 의 self-host push.
- **Auto-fix 의 hallucination**: 매 production push 의 risk.
- **AI 의 false positive 의 fatigue**: 매 dev 의 dismiss.
- **Cost ↑**: 매 LLM API 의 매 PR 의 $.
- **DORA metric 의 unclear improvement**: 매 study 의 mixed evidence.
## 🔗 지식 연결 (Graph)
- 부모: [[AI_코드_리뷰]] · [[Static-Analysis]] · [[CI_CD 파이프라인 및 IDE 통합 보안|DevSecOps]]
- 변형: [[CodeRabbit]] · [[Greptile]] · [[Cursor]] · [[Sonar]]
- 응용: [[Codebase-RAG]] · [[Code-Property-Graph]]
- 기술: [[AST]] · [[Semgrep]] · [[CodeQL]] · [[Joern]]
- 응용: [[Behavioral-Code-Analysis]] · [[Technical_Debt|Technical-Debt]]
- Adjacent: [[AI-Code-Agent-Patterns]]
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
**언제 이 지식을 쓰는가:**
- 매 organization 의 code analysis tool 의 selection.
- 매 CI / PR workflow 의 design.
- 매 enterprise 의 SAST + AI 의 hybrid.
- 매 codebase RAG 의 build.
- 매 MCP server 의 작성.
**언제 쓰면 안 되는가:**
- Specific vendor 의 detailed comparison (changing).
- 매 specific compliance (SOC 2, etc.) 의 detailed (auditor).
- Very small project (overkill).
## ❌ 안티패턴 (Anti-Patterns)
- **Single tool 만**: 매 layer 의 gap.
- **모든 tool**: redundant + cost.
- **Auto-fix + no review**: hallucination 의 production.
- **Cloud AI + sensitive code**: IP leak.
- **No false positive feedback loop**: alert fatigue.
- **Tool 의 metric 의 game**: 매 outcome ≠ adoption.
- **Behavioral analysis 무시**: 매 hotspot 의 invisible.
## 🧪 검증 상태 (Validation)
- **정보 상태:** verified (concept-level).
- **출처 신뢰도:** B (vendor docs, GitHub Octoverse, Stanford CodeX research).
- **검토 이유:** Manual cleanup. 매 vendor / tool 의 매 6 month 의 evolution.
## 🧬 중복 검사 (Duplicate Check)
- **기존 유사 문서:** [[AI_코드_리뷰]] (related), [[AI_코드_리뷰]] (related), [[AI_Powered_Code_Analysis]] (similar — possibly duplicate).
- **처리 방식:** KEEP (focused on tool landscape).
- **처리 이유:** 매 tool 의 broader survey.
## 🕓 변경 이력 (Changelog)
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|------|-----------|-----------|--------|
| 2026-05-08 | P-Reinforce Phase 1 정규화 | UPDATE | A |
| 2026-05-09 | Manual cleanup — capability layer + tool family + organizational pattern + code + 안티패턴 추가 | UPDATE | B |