--- id: wiki-2026-0508-codebase-onboarding title: Codebase Onboarding category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Developer Onboarding, Code Onboarding, New Hire Ramp-up] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [engineering-management, devex, documentation, llm-tools, productivity] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python / TypeScript framework: Claude Code / Cursor / Sourcegraph --- # Codebase Onboarding ## 매 한 줄 > **"매 new engineer 의 first PR 의 ship 의 time 의 minimize — codebase mental model 의 building 의 single biggest leverage point"**. 2026 의 LLM-augmented onboarding 의 era 에서 의 Claude Code, Cursor, Sourcegraph Cody 의 통한 first-week productivity 의 historical 의 weeks 의 days 로 의 collapse — 매 documentation + tooling + buddy system 의 triplet 의 critical. ## 매 핵심 ### 매 4 phases - **Day 0–1: Environment** — repo clone, build, test 의 green - **Day 2–5: Map** — system architecture, ownership boundary, glossary - **Week 2: First PR** — small bug fix or doc 의 contribution - **Month 1: Ownership** — feature 의 lead, on-call participation ### 매 friction sources (Microsoft 2024 study) - Tribal knowledge (60% of blockers) - Stale documentation (45%) - Build / dev-env setup (30%) - Implicit code conventions (28%) - Domain language gaps (25%) ### 매 응용 1. New hire ramp-up 의 5-day → 1.5-day 로 의 acceleration (LLM-assisted). 2. Acquisition integration — acquired team 의 codebase 의 onboard. 3. Open-source contributor 의 first-time contributor experience. 4. Inner-source — cross-team contribution friction 의 reduce. ## 💻 패턴 ### CLAUDE.md / AGENTS.md (LLM context primer) ```markdown # Project Context ## Stack - Backend: Python 3.13, FastAPI, Postgres 16, Redis 7 - Frontend: Next.js 15 (App Router), React 19 - Infra: AWS, Pulumi, GitHub Actions ## Conventions - Async-first; no sync DB calls in handlers. - Tests: pytest, ≥85% coverage required. - Commits: conventional commits (feat/fix/chore). ## Domain glossary - "Account" = billing entity (≠ User) - "Workspace" = collaboration scope - "Project" = single deployment unit ## Key files - `apps/api/main.py` — FastAPI entry - `packages/db/schema.sql` — canonical schema - `infra/pulumi/` — IaC ## Onboarding tasks 1. Run `make bootstrap` then `make test`. 2. Read `docs/architecture.md`. 3. Pick a "good-first-issue" label. ``` ### `make bootstrap` (one-command setup) ```makefile .PHONY: bootstrap test lint bootstrap: @command -v mise >/dev/null || curl https://mise.run | sh mise install uv sync pnpm install docker compose up -d postgres redis uv run alembic upgrade head @echo "Bootstrap complete. Run 'make test' to verify." test: uv run pytest -x --cov=src pnpm test lint: uv run ruff check . pnpm lint ``` ### Architecture Decision Records (ADR) ```markdown # ADR-007: Why we chose Postgres over MongoDB Date: 2024-11-12 Status: Accepted ## Context We need transactional consistency for billing. ## Decision Postgres 16 with row-level security. ## Consequences + Strong ACID for money flows - Schema migrations require Alembic discipline ``` ### LLM-powered code map ```python from anthropic import Anthropic client = Anthropic() def codebase_summary(repo_files: list[str]) -> str: """Generate onboarding-friendly codebase map using prompt cache.""" response = client.messages.create( model="claude-opus-4-7", max_tokens=4000, system=[ {"type": "text", "text": "You are an onboarding assistant."}, {"type": "text", "text": "\n".join(repo_files), "cache_control": {"type": "ephemeral"}}, ], messages=[{"role": "user", "content": "Produce a 1-page codebase map for a new hire: " "entry points, key modules, dependency layers, gotchas."}], ) return response.content[0].text ``` ### Buddy system + PR mentoring ```python @dataclass class OnboardingPlan: new_hire: str buddy: str week1_tasks: list[str] week2_tasks: list[str] def daily_standup_questions(self) -> list[str]: return [ "어제 의 가장 confusing 의 part?", "오늘 의 목표 의 1 of?", "block 의 의 의?", ] ``` ### "First PR by EOD-1" success metric ```python def first_pr_metrics(hires: list[dict]) -> dict: """Lead indicator of onboarding health.""" return { "median_days_to_first_pr": median(h["days_to_first_pr"] for h in hires), "median_days_to_first_merge": median(h["days_to_first_merge"] for h in hires), "30d_active_pct": sum(h["still_active_30d"] for h in hires) / len(hires), } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Greenfield team | Heavy CLAUDE.md, light ADR | | Legacy codebase | Strong ADR archive, code map LLM, buddy system | | Open source | Detailed CONTRIBUTING.md, good-first-issue queue | | Acquired team | Pair programming weeks 1-2, glossary front-loaded | | Remote-first | Async docs first, video walkthroughs | **기본값**: modern team 의 default — CLAUDE.md + make bootstrap + buddy + first-PR-by-EOD-3. ## 🔗 Graph - 부모: [[Team Culture & Onboarding (팀 문화 및 온보딩)]] - 변형: [[Program Comprehension Strategies]] · [[Codebase_Maps_&_Interactive_Tours]] - 응용: [[Pull_Request_and_Issue_Tracking]] - Adjacent: [[GIT_PROTOCOL]] · [[Process_Reflection_Template]] ## 🤖 LLM 활용 **언제**: codebase summary generation, onboarding doc 의 audit (gap detection), new-hire Q&A bot. **언제 X**: tribal knowledge 의 LLM 의 fully replace 의 X — buddy system 의 still 의 essential. ## ❌ 안티패턴 - **"Read the code"**: docs 의 absence 의 excuse 의 X. 매 entrypoint 의 explicit. - **Stale README**: bootstrap step 의 not-working 의 first-day blocker. - **Trial-by-fire**: 큰 critical 의 task 의 week-1 의 assign — burnout 의 amplify. - **Single buddy bottleneck**: buddy 의 vacation 의 의 onboarding 의 stall. ## 🧪 검증 / 중복 - Verified (Microsoft 2024 *Developer Velocity Lab*; Stripe *Increment Magazine* onboarding issue). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — phases, CLAUDE.md, bootstrap, ADR, LLM code map |