5.8 KiB
5.8 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ai-code-agent-patterns | Code Agent — Devin / Cursor / Claude Code | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Code Agent
AI 가 code 작성 / 수정. Cursor (IDE), Claude Code (CLI), Devin (autonomous), Aider (open). SWE-bench 가 eval.
📖 핵심 개념
- File read / edit / run tools.
- Plan + execute + verify.
- Sandbox (E2B / Daytona).
- Test 가 ground truth.
💻 코드 패턴
Aider (open source CLI)
pip install aider-chat
cd my-repo
aider --model claude-opus-4-7
# 명령
> add a /health endpoint to server.ts
# → 자동 edit + commit.
→ Git-aware. 매 변경 = commit.
Cursor (IDE)
- Composer (multi-file).
- Cmd-K (inline).
- Tab autocomplete.
- @file / @docs / @web reference.
→ VS Code fork.
Claude Code (CLI)
claude
# → terminal-based agent.
→ Anthropic 의 native.
Tool composition
- read_file
- write_file
- edit_file (precise diff)
- bash (run command)
- search (grep / file)
- web (fetch)
Plan → Execute
1. User: "Add OAuth login".
2. Plan: 5 step (read existing auth, add provider, route, ...).
3. Execute: 매 step → read / edit / test.
4. Verify: tests pass?
5. Commit.
Test-driven
1. Read existing test.
2. Add new test (RED).
3. Implement (GREEN).
4. Refactor.
5. Verify (re-run).
→ TDD with AI.
File reading strategy
- 직접 path: fast.
- Search (grep): fuzzy.
- Symbol search: function / class.
- Recent change: git log.
→ Context 가 limited. 정확 retrieve.
Edit precision
Two strategies:
1. Full file rewrite: 큰 file = waste.
2. Diff / patch: 정확 + cheap.
→ Aider / Claude Code 가 diff.
Cursor 가 mix.
Sandbox execution
- Test run (E2B / Modal).
- Build verify.
- Lint / type check.
→ Real verification.
SWE-bench
- 2294 real GitHub issue.
- "Fix this issue" given repo + issue.
- Pass = test 가 새 + 옛 둘 다 OK.
→ State-of-the-art:
- 2023: 2-3% pass.
- 2024: 30-50% pass.
- 2026: 60%+ pass.
→ 매년 ↑.
Devin (autonomous)
- 큰 task 가 hours.
- Browser + code + plan.
- 사람 review 가 필요.
→ "AI software engineer".
Limitations
- Long context (큰 codebase) = lost.
- 매우 새 / niche 기술.
- Mathematical / algorithmic.
- Multi-step refactor.
- Production debugging (real-time).
→ 사람 가 still 필요.
Best practice (사용자)
- Small task: agent.
- Big task: human plan + agent execute.
- Critical / security: human review.
- Test 가 baseline.
- Git commit 가 자주 (rollback).
Prompt 가이드
✓ "Fix bug in users.ts:42 — null check 가 missing for empty list".
✗ "Make better".
✓ "Add /health endpoint returning {status: 'ok'}".
✗ "Add monitoring".
→ Specific. 명확.
Multi-file edit
"Refactor all UserService.* call to NewUserService".
→ Agent 가:
1. Search 모든 caller.
2. 매 file 의 edit.
3. Run test.
4. Iterate.
Code review by AI
PR review:
- Read diff.
- Comment specific.
- Suggest improvement.
- Find bug.
→ CodeRabbit / Greptile / Sourcery.
Production agent
- Agentic IDE (Cursor, Windsurf, Zed AI).
- CLI (Claude Code, Aider).
- Autonomous (Devin, Cognition).
- PR review (CodeRabbit).
- Inline (Copilot, Tabnine).
MCP (Model Context Protocol)
// Anthropic 의 표준.
// Editor 가 server expose: file, git, run.
// Agent 가 같은 protocol.
→ Cursor / Claude Desktop / Cline 가 native.
Eval
Internal:
- Unit test pass rate.
- Refactor preservation.
- Code style 일관.
Public:
- SWE-bench, HumanEval, BigCodeBench.
Multi-agent (subagent)
Main agent:
- Plan + coordinate.
Sub-agent:
- Search file.
- Run test.
- Refactor specific.
→ Anthropic 의 Claude Code / multi-agent.
Memory
- Conversation context.
- File contents.
- Git history.
- Test results.
→ Memory budget. Compress.
함정
- Agent 가 infinite loop.
- Cost 폭발 (큰 task = 매 hour $).
- Test 없음 = silent break.
- Git 없음 = rollback 불가.
- Human review 없음 = production bug.
Workflow integration
1. GitHub issue → agent.
2. PR open → agent review.
3. Test fail → agent fix.
4. Human approve → merge.
→ AI + human loop.
Privacy
- 코드 가 model API server.
- Local model (Ollama) = privacy.
- Code-only fine-tune (CodeLlama).
→ Sensitive code = local / private cloud.
Cost
1 day Claude Code: $5-50 (tokens).
Cursor Pro: $20 / month.
Devin: $500 / month.
Aider + own API: $10-100 / month.
→ Productivity ↑ 가 cost 정당화.
Future
- 점점 autonomous.
- Test-first AI.
- Long-context (1M+ token).
- Multi-modal (UI screenshot → code).
- Fine-tuned per-codebase.
🤔 의사결정 기준
| 작업 | 추천 |
|---|---|
| Quick edit | Cursor / Copilot |
| Multi-file | Cursor Composer / Claude Code |
| Autonomous | Devin (review 필요) |
| Open / CLI | Aider |
| PR review | CodeRabbit / Greptile |
| Test write | Any agent + test |
| Refactor | Aider / Cursor |
❌ 안티패턴
- Agent 만 + no review: production bug.
- No test: silent break.
- No git commit: rollback X.
- Vague prompt: bad output.
- Long-running 무 supervision: cost / loop.
- Sensitive code 가 public model: leak.
🤖 LLM 활용 힌트
- Cursor / Claude Code 가 modern default.
- Test = ground truth.
- Aider 가 git-aware open.
- MCP 가 표준 protocol.