---
id: wiki-2026-0508-llm-based-code-analysis
title: LLM-based Code Analysis
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [AI Code Review, LLM Code Review, AI-augmented Static Analysis]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [llm, code-review, static-analysis, ai-tooling, devx]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack: { language: any, framework: claude/gpt/cursor/cody }
---

# LLM-based Code Analysis

## 매 한 줄
> **"매 LLM 은 의도 (intent) 를 본다"**. AST 는 syntax, LLM 은 semantics 와 naming, 두 layer 를 합쳐야 진짜 review 가 된다.

## 매 핵심
### 매 두 layer
- **Deterministic** (AST/SAST): ESLint, Semgrep, CodeQL — taint, null, type
- **Probabilistic** (LLM): Claude/GPT — naming, design, "이 함수 왜 존재?", architectural smell
- 둘은 **보완**. LLM 만으로는 false-positive 폭발, AST 만으로는 의도 못 봄

### 매 응용
1. **PR review bot**: diff → LLM → 댓글
2. **Refactor suggestions**: "이 함수 분리해야" 제안
3. **Code search semantic**: Sourcegraph Cody, "auth 검증하는 곳" 자연어 검색
4. **Doc generation**: 함수 → docstring 자동
5. **Bug hunt**: "이 코드에 race condition 있나?"

## 💻 패턴

### Pattern 1: PR review with Claude
```python
# .github/workflows/claude-review.yml trigger
import anthropic, os
from github import Github

def review_pr(pr_number):
    gh = Github(os.environ["GH_TOKEN"])
    pr = gh.get_repo(os.environ["REPO"]).get_pull(pr_number)
    diff = pr.get_files()
    diff_text = "\n".join(f"{f.filename}\n{f.patch}" for f in diff if f.patch)

    client = anthropic.Anthropic()
    msg = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2000,
        system="You are a senior reviewer. Comment only on real issues. Skip nits.",
        messages=[{"role": "user", "content": f"Review this diff:\n{diff_text}"}],
    )
    pr.create_issue_comment(msg.content[0].text)
```

### Pattern 2: AST + LLM hybrid
```python
import ast

def find_long_functions(src):
    tree = ast.parse(src)
    return [n for n in ast.walk(tree)
            if isinstance(n, ast.FunctionDef) and (n.end_lineno - n.lineno) > 50]

# AST 가 후보 추림 → LLM 이 의도 분석
for fn in find_long_functions(open("app.py").read()):
    snippet = ast.get_source_segment(src, fn)
    ask_llm(f"Why is this function long? Should it be split?\n{snippet}")
```

### Pattern 3: Cursor / Continue inline review
```jsonc
// .cursor/rules
{
  "review": {
    "trigger": "on_save",
    "prompt": "Flag: missing null check, magic number, leaky abstraction. Be terse."
  }
}
```

### Pattern 4: Sourcegraph Cody semantic search
```bash
# CLI
cody chat "어디서 user session 검증하는지 찾아줘"
# → ranks files by semantic match, not grep
```

### Pattern 5: Cost guard for LLM review
```python
# 큰 PR 은 file-by-file, small 은 한번에
def chunk_strategy(diff_lines):
    if diff_lines < 200: return "single"
    if diff_lines < 1000: return "per_file"
    return "summary_only"  # 대형 PR 은 high-level summary 만
```

### Pattern 6: Prompt for naming smell
```
You are reviewing variable/function names. Flag ONLY:
- Unclear (data, info, tmp, x)
- Lying (getUser that mutates)
- Inconsistent with rest of codebase
Output JSON: [{file, line, suggestion}]
```

### Pattern 7: Reject auto-merge if LLM finds blocker
```yaml
- name: LLM gate
  run: python review.py --severity-threshold blocker
  # exit 1 if any "blocker" found
```

## 매 결정 기준

| 상황 | Approach |
|---|---|
| Type/null/taint 검출 | AST/SAST (deterministic) |
| Design / naming / intent | LLM |
| 둘 다 필요 | Hybrid (AST 후보 → LLM 분석) |
| 큰 PR (>1k line) | Summary only, per-file 비용 폭발 |
| Security critical | CodeQL primary, LLM secondary |

**기본값**: Semgrep + Claude review bot, blocker 만 PR 차단.

## 🔗 Graph
- 부모: [[Code_Review]], [[Static_Analysis]]
- 변형: [[Cursor]]
- 응용: [[CI_CD_Pipeline|CI_CD]]
- Adjacent: [[LLM_Ops_and_Tuning]], [[Prompt_Engineering]]

## 🤖 LLM 활용
**언제**: 의도/설계 review, naming, refactor 제안, 자연어 코드 검색.
**언제 X**: 보안 critical (CodeQL/Semgrep 우선), 결정론적 검증 (type checker), hot path latency.

## ❌ 안티패턴
- LLM 출력 100% 신뢰 → false-positive 폭주, 리뷰어 피로
- AST 없이 LLM 만 → 비용 폭발, deterministic check 누락
- "Nit" 까지 코멘트 → 신호 대 잡음 ↓
- Diff 전체를 한 prompt 에 → context limit, 비용
- Public repo 에 unredacted secret 포함 코드 LLM 전송

## 🧪 검증 / 중복
- Verified (Anthropic Claude API, Cursor docs, Sourcegraph Cody, Semgrep). 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — hybrid AST+LLM, PR review bot patterns |