Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

15 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack

title

AI-Powered Code Analysis Tools

📌 한 줄 통찰 (The Karpathy Summary)

LLM + AST + codebase RAG 의 매 file 의 deep context 분석. SAST + behavioral analysis + cross-repository. CodeRabbit (PR), Greptile (큰 codebase), Cursor / Claude Code (IDE), Sonar / Snyk (enterprise). 매 organization 의 hybrid stack.

📖 구조화된 지식 (Synthesized Content)

매 capability layer

1. Static analysis (AST)

매 file 의 syntax tree.
매 rule (ESLint, Pylint, clippy).
매 type-check.
Cyclomatic complexity.

2. Semantic analysis (LLM)

매 intent / context.
매 ambiguity.
매 idiom.
매 architectural pattern.

3. Cross-file analysis

매 dependency graph.
매 import / export.
매 call graph.
Code Property Graph (CPG).

4. Cross-repository (modern)

매 microservice 의 contract.
매 API consumer.
매 shared library 의 impact.

5. Behavioral analysis

매 git history.
매 hotspot (frequent change).
매 author concentration.
매 technical debt.

매 tool family

PR review (LLM-based)

Tool	강점
CodeRabbit	매 PR 의 summary + comment
Greptile	큰 codebase 의 context
Sourcery	매 commit 의 refactor
Qodo (옛 Codium)	Test generation
Bito	매 PR 의 review
Korbit	DevSecOps focus

IDE assist

Tool	강점
Cursor	AI-native IDE
Claude Code	Terminal CLI
GitHub Copilot	Most popular autocomplete
Continue.dev	Open source IDE plugin
Tabnine	Privacy / on-prem option
Cody (Sourcegraph)	매 codebase 의 graph
Aider	Git-aware CLI

Static + AI hybrid

Tool	강점
SonarQube + Sonar AI	Enterprise SAST + AI
Snyk Code	Security + AI fix
Semgrep	Pattern-based + AI
Veracode	Enterprise security
Checkmarx	Enterprise SAST
Corgea	AI auto-fix focus
GitHub Advanced Security	CodeQL + AI

Codebase intelligence

Tool	강점
Sourcegraph	Code search + graph
Greptile	LLM + codebase RAG
Kodesage	Legacy + Jira + DB integration
Qodana (JetBrains)	IDE-integrated
CodeScene	Behavioral analysis
GitLoop	Code Q&A bot

매 modern technique

MCP (Model Context Protocol)

매 standardized protocol (Anthropic).
매 LLM 의 GitHub / file system / external tool 의 access.
매 Cursor, Claude Desktop, Cline 의 native.

Codebase RAG

매 file / function 의 embedding.
매 query → top-K retrieval.
매 LLM 의 context.

Code Property Graph (CPG)

AST + control flow + data flow + 매 graph.
매 security analysis 의 superior.
Joern / Atom 의 example.

Taint analysis

매 user input → tainted.
매 sensitive operation 의 reach.
매 SQL injection / XSS / SSRF detect.

Auto-fix (LLM-generated)

매 vulnerability 의 patch.
매 confidence score.
매 human review (high-stakes).

매 deployment model

SaaS

매 vendor cloud.
매 quick start.
매 IP / privacy concern.

On-premise

매 self-host.
매 enterprise / regulated.
Sonar / Snyk / Veracode 가 지원.

Air-gapped

매 government / defense.
매 internal LLM 의 fine-tune.
Qodo, Kodesage, Fortify.

매 organizational pattern

Layer 1: IDE (real-time)

매 dev 의 Cursor / Copilot.
매 keystroke 의 feedback.

Layer 2: Pre-commit (local)

매 husky + lint-staged.
매 ESLint, Prettier, type check.

Layer 3: CI / PR (automated)

매 GitHub Actions / GitLab CI.
매 CodeRabbit / Greptile.
매 SAST (Snyk, Sonar).

Layer 4: Periodic deep scan

매 weekly / monthly.
매 codebase-wide.
매 dependency vulnerability.

매 limitation

Context window

큰 PR (50+ file) 의 quality ↓.
큰 monorepo 의 hard.

False positive

Alert fatigue.
Manual tuning.

AI hallucination

매 niche framework.
매 wrong fix.
LLM-as-judge 의 partial fix.

Privacy / IP

매 cloud AI 의 code 의 vendor.
매 enterprise 의 self-host requirement.

Cost

LLM API call.
Compute (RAG indexing).
Vendor licensing.

매 ROI metric

DORA

Lead time.
Deployment frequency.
Change failure rate.
MTTR.

Tool-specific

AI suggestion accept rate.
False positive rate.
매 PR review time.
매 security finding.

매 caveat (Goodhart)

매 metric 의 game-able.
매 outcome ≠ 매 tool adoption.

💻 코드 패턴 (Code Patterns)

CodeRabbit setup

# .coderabbit.yaml
language: en
reviews:
  profile: chill
  high_level_summary: true
  request_changes_workflow: false
  
  path_filters:
    - '!**/dist/**'
    - '!**/*.lock'
  
  auto_review:
    enabled: true
    drafts: false

chat:
  auto_reply: true

Greptile (codebase RAG)

# Index codebase
greptile index https://github.com/org/repo

# Query
greptile ask "Where is user authentication implemented?"

Cursor (IDE config)

// .cursor/rules
{
  "rules": [
    "Prefer functional components.",
    "Use TypeScript strict mode.",
    "No new dependencies without approval."
  ]
}

Custom Semgrep rule

rules:
  - id: ai-prompt-injection
    pattern-either:
      - pattern: |
          $LLM.complete(... + $USER_INPUT + ...)
      - pattern: |
          $LLM.complete(`...${$USER_INPUT}...`)
    message: |
      Prompt injection risk: user input concatenated into LLM prompt.
      Use parameterized template or input validation.
    severity: ERROR
    languages: [python, javascript, typescript]

MCP server (custom analysis tool)

import { Server } from '@modelcontextprotocol/sdk/server/index.js';

const server = new Server({ name: 'code-analyzer', version: '1.0.0' });

server.setRequestHandler(ListToolsRequestSchema, () => ({
  tools: [
    {
      name: 'find_security_issue',
      description: 'Scan code for security issue',
      inputSchema: {
        type: 'object',
        properties: { file: { type: 'string' } },
        required: ['file']
      }
    }
  ]
}));

server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name === 'find_security_issue') {
    const issues = await scanSecurity(req.params.arguments.file);
    return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
  }
});

Codebase RAG (custom)

from sentence_transformers import SentenceTransformer
import lancedb

model = SentenceTransformer('all-MiniLM-L6-v2')

def index_codebase(repo_path: str):
    db = lancedb.connect("./codebase.db")
    chunks = []
    
    for file in walk_python_files(repo_path):
        for func in extract_functions(file):
            embedding = model.encode(func.body)
            chunks.append({
                "file": file,
                "function": func.name,
                "code": func.body,
                "embedding": embedding,
            })
    
    db.create_table("code", data=chunks)

def query(question: str, k: int = 5):
    db = lancedb.connect("./codebase.db")
    table = db.open_table("code")
    
    q_emb = model.encode(question)
    results = table.search(q_emb).limit(k).to_list()
    return results

Auto-fix (with confidence gate)

def auto_fix_pr(pr, suggestions):
    for s in suggestions:
        if s.confidence < 0.95:
            post_comment(pr, s.file, s.line, s.suggestion)   # human review
            continue
        
        if s.is_high_stakes:   # security, business-critical
            post_comment(pr, s.file, s.line, s.suggestion + ' (review needed)')
            continue
        
        # Auto-apply
        apply_fix(s.file, s.line, s.replacement)
        commit_message = f"AI auto-fix: {s.summary}\n\nSeverity: {s.severity}\nConfidence: {s.confidence}"
        commit(commit_message, author='bot')

Behavioral hotspot detection

import git

def find_hotspots(repo_path: str):
    repo = git.Repo(repo_path)
    
    # 매 file 의 commit count
    file_changes = defaultdict(int)
    for commit in repo.iter_commits('main', max_count=1000):
        for file in commit.stats.files:
            file_changes[file] += 1
    
    # 매 file 의 complexity
    file_complexity = {}
    for file in file_changes.keys():
        file_complexity[file] = compute_cyclomatic_complexity(file)
    
    # Hotspot = 매 high churn × high complexity
    hotspots = [
        {'file': f, 'churn': c, 'complexity': file_complexity.get(f, 0), 
         'hotspot_score': c * file_complexity.get(f, 0)}
        for f, c in file_changes.items()
    ]
    return sorted(hotspots, key=lambda x: -x['hotspot_score'])[:20]

CI integration (multi-tool)

# .github/workflows/code-quality.yml
on: [pull_request]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      
      # Static
      - run: npm run lint
      - run: npm run typecheck
      
      # Security
      - uses: snyk/actions/setup@master
      - run: snyk code test
      
      # AI review (CodeRabbit auto-runs)
      
      # Test coverage
      - run: npm test -- --coverage
      - uses: codecov/codecov-action@v3
      
      # SonarQube
      - uses: SonarSource/sonarcloud-github-action@master
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

AI eval 의 quality

# Manual sample
def eval_ai_review(num_samples=20):
    samples = []
    for pr in recent_prs(20):
        ai_findings = ai_review(pr)
        human_review = get_human_review(pr)
        
        true_positive = len(set(ai_findings) & set(human_review.issues))
        false_positive = len(set(ai_findings) - set(human_review.issues))
        false_negative = len(set(human_review.issues) - set(ai_findings))
        
        samples.append({
            'pr': pr.id,
            'precision': true_positive / max(len(ai_findings), 1),
            'recall': true_positive / max(len(human_review.issues), 1),
        })
    
    return samples

Custom rule per team

# .team/rules/api-pattern.yaml
- id: prefer-tRPC-over-REST
  pattern: |
    fetch('/api/...')
  message: |
    This codebase uses tRPC. Prefer trpc.* over fetch.
  severity: WARNING

Auto-fix 의 PR-only scope

// 매 auto-fix 가 own PR (not 매 PR 의 mix)
async function processSuggestion(suggestion) {
  const branch = `ai-fix/${suggestion.id}`;
  await git.checkoutBranch(branch);
  await applyFix(suggestion);
  await git.commit(`AI auto-fix: ${suggestion.summary}`);
  await git.push(branch);
  
  await openPR({
    title: `[AI Fix] ${suggestion.summary}`,
    body: `Severity: ${suggestion.severity}\nConfidence: ${suggestion.confidence}\n\n${suggestion.explanation}`,
    head: branch,
    base: 'main',
  });
}

🤔 의사결정 기준 (Decision Criteria)

상황	추천 stack
Small startup	Cursor + CodeRabbit
Mid-size	+ Snyk Code
Enterprise	Sonar + Snyk + CodeRabbit + Cursor
Privacy / on-prem	Sonar self-host + ConnectAI / Continue.dev
Air-gapped	Qodo + internal LLM
Legacy / large monorepo	Greptile + Kodesage
Security-critical	Veracode + Snyk + Semgrep
Behavioral / debt	CodeScene

기본값: Cursor (IDE) + CodeRabbit (PR) + Snyk (security). 매 layer 의 different tool.

⚠️ 모순 및 업데이트 (Contradictions & Updates)

Tool consolidation vs best-of-breed: 매 tool 의 multiple = redundant overhead. 매 single 의 limit.
Cloud AI vs privacy: 매 enterprise 의 self-host push.
Auto-fix 의 hallucination: 매 production push 의 risk.
AI 의 false positive 의 fatigue: 매 dev 의 dismiss.
Cost ↑: 매 LLM API 의 매 PR 의 $.
DORA metric 의 unclear improvement: 매 study 의 mixed evidence.

🔗 지식 연결 (Graph)

부모: AI_코드_리뷰 · Static-Analysis · CI/CD Pipeline & IDE Security Integration
변형: CodeRabbit · Greptile · Cursor · Sonar
응용: Codebase-RAG · Code Property Graph
기술: AST · Semgrep · CodeQL · Joern
응용: Behavioral-Code-Analysis · Technical_Debt
Adjacent: Code Agent — Devin / Cursor / Claude Code

🤖 LLM 활용 힌트 (How to Use This Knowledge)

언제 이 지식을 쓰는가:

매 organization 의 code analysis tool 의 selection.
매 CI / PR workflow 의 design.
매 enterprise 의 SAST + AI 의 hybrid.
매 codebase RAG 의 build.
매 MCP server 의 작성.

언제 쓰면 안 되는가:

Specific vendor 의 detailed comparison (changing).
매 specific compliance (SOC 2, etc.) 의 detailed (auditor).
Very small project (overkill).

❌ 안티패턴 (Anti-Patterns)

Single tool 만: 매 layer 의 gap.
모든 tool: redundant + cost.
Auto-fix + no review: hallucination 의 production.
Cloud AI + sensitive code: IP leak.
No false positive feedback loop: alert fatigue.
Tool 의 metric 의 game: 매 outcome ≠ adoption.
Behavioral analysis 무시: 매 hotspot 의 invisible.

🧪 검증 상태 (Validation)

정보 상태: verified (concept-level).
출처 신뢰도: B (vendor docs, GitHub Octoverse, Stanford CodeX research).
검토 이유: Manual cleanup. 매 vendor / tool 의 매 6 month 의 evolution.

🧬 중복 검사 (Duplicate Check)

기존 유사 문서: AI_코드_리뷰 (related), AI_코드_리뷰 (related), AI_Powered_Code_Analysis (similar — possibly duplicate).
처리 방식: KEEP (focused on tool landscape).
처리 이유: 매 tool 의 broader survey.

🕓 변경 이력 (Changelog)

날짜	변경 내용	처리 방식	신뢰도
2026-05-08	P-Reinforce Phase 1 정규화	UPDATE	A
2026-05-09	Manual cleanup — capability layer + tool family + organizational pattern + code + 안티패턴 추가	UPDATE	B

15 KiB Raw Blame History Unescape Escape