f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.8 KiB
4.8 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-llm-based-code-analysis | LLM-based Code Analysis | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
LLM-based Code Analysis
매 한 줄
"매 LLM 은 의도 (intent) 를 본다". AST 는 syntax, LLM 은 semantics 와 naming, 두 layer 를 합쳐야 진짜 review 가 된다.
매 핵심
매 두 layer
- Deterministic (AST/SAST): ESLint, Semgrep, CodeQL — taint, null, type
- Probabilistic (LLM): Claude/GPT — naming, design, "이 함수 왜 존재?", architectural smell
- 둘은 보완. LLM 만으로는 false-positive 폭발, AST 만으로는 의도 못 봄
매 응용
- PR review bot: diff → LLM → 댓글
- Refactor suggestions: "이 함수 분리해야" 제안
- Code search semantic: Sourcegraph Cody, "auth 검증하는 곳" 자연어 검색
- Doc generation: 함수 → docstring 자동
- Bug hunt: "이 코드에 race condition 있나?"
💻 패턴
Pattern 1: PR review with Claude
# .github/workflows/claude-review.yml trigger
import anthropic, os
from github import Github
def review_pr(pr_number):
gh = Github(os.environ["GH_TOKEN"])
pr = gh.get_repo(os.environ["REPO"]).get_pull(pr_number)
diff = pr.get_files()
diff_text = "\n".join(f"{f.filename}\n{f.patch}" for f in diff if f.patch)
client = anthropic.Anthropic()
msg = client.messages.create(
model="claude-opus-4-7",
max_tokens=2000,
system="You are a senior reviewer. Comment only on real issues. Skip nits.",
messages=[{"role": "user", "content": f"Review this diff:\n{diff_text}"}],
)
pr.create_issue_comment(msg.content[0].text)
Pattern 2: AST + LLM hybrid
import ast
def find_long_functions(src):
tree = ast.parse(src)
return [n for n in ast.walk(tree)
if isinstance(n, ast.FunctionDef) and (n.end_lineno - n.lineno) > 50]
# AST 가 후보 추림 → LLM 이 의도 분석
for fn in find_long_functions(open("app.py").read()):
snippet = ast.get_source_segment(src, fn)
ask_llm(f"Why is this function long? Should it be split?\n{snippet}")
Pattern 3: Cursor / Continue inline review
// .cursor/rules
{
"review": {
"trigger": "on_save",
"prompt": "Flag: missing null check, magic number, leaky abstraction. Be terse."
}
}
Pattern 4: Sourcegraph Cody semantic search
# CLI
cody chat "어디서 user session 검증하는지 찾아줘"
# → ranks files by semantic match, not grep
Pattern 5: Cost guard for LLM review
# 큰 PR 은 file-by-file, small 은 한번에
def chunk_strategy(diff_lines):
if diff_lines < 200: return "single"
if diff_lines < 1000: return "per_file"
return "summary_only" # 대형 PR 은 high-level summary 만
Pattern 6: Prompt for naming smell
You are reviewing variable/function names. Flag ONLY:
- Unclear (data, info, tmp, x)
- Lying (getUser that mutates)
- Inconsistent with rest of codebase
Output JSON: [{file, line, suggestion}]
Pattern 7: Reject auto-merge if LLM finds blocker
- name: LLM gate
run: python review.py --severity-threshold blocker
# exit 1 if any "blocker" found
매 결정 기준
| 상황 | Approach |
|---|---|
| Type/null/taint 검출 | AST/SAST (deterministic) |
| Design / naming / intent | LLM |
| 둘 다 필요 | Hybrid (AST 후보 → LLM 분석) |
| 큰 PR (>1k line) | Summary only, per-file 비용 폭발 |
| Security critical | CodeQL primary, LLM secondary |
기본값: Semgrep + Claude review bot, blocker 만 PR 차단.
🔗 Graph
- 부모: Code_Review, Static_Analysis
- 변형: Cursor
- 응용: CI_CD_Pipeline
- Adjacent: LLM_Ops_and_Tuning, Prompt_Engineering
🤖 LLM 활용
언제: 의도/설계 review, naming, refactor 제안, 자연어 코드 검색. 언제 X: 보안 critical (CodeQL/Semgrep 우선), 결정론적 검증 (type checker), hot path latency.
❌ 안티패턴
- LLM 출력 100% 신뢰 → false-positive 폭주, 리뷰어 피로
- AST 없이 LLM 만 → 비용 폭발, deterministic check 누락
- "Nit" 까지 코멘트 → 신호 대 잡음 ↓
- Diff 전체를 한 prompt 에 → context limit, 비용
- Public repo 에 unredacted secret 포함 코드 LLM 전송
🧪 검증 / 중복
- Verified (Anthropic Claude API, Cursor docs, Sourcegraph Cody, Semgrep). 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — hybrid AST+LLM, PR review bot patterns |