---
id: wiki-2026-0508-codebase-onboarding-guide
title: Codebase Onboarding Guide
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [P-REINFORCE-WIKI-DEV-ONBOARDING-GUIDE, 온보딩 가이드, Codebase Onboarding, 시스템 파악, 멘탈 모델 구축]
duplicate_of: none
source_trust_level: A
confidence_score: 1.0
tags: [Onboarding, Knowledge_Sharing, System_Analysis, Developer_Experience, Collaboration]
raw_sources: [Datacollector_Export_2026-05-02]
last_reinforced: 2026-05-02
github_commit: pending
tech_stack:
  language: unspecified
  framework: unspecified
---

# [[코드베이스 온보딩 실전 가이드 (Codebase Onboarding Guide)]]

## 1. 개요
코드베이스 온보딩은 새로운 개발자가 낯선 시스템의 구조와 동작 방식을 파악하여 실질적인 기여자가 되는 과정이다. 수백만 줄의 코드를 한 번에 읽으려는 무리한 시도 대신, 시스템의 핵심 지형을 파악하고 점진적으로 지식을 확장하는 전략적 접근이 필요하다.

## 2. 온보딩 4단계 워크플로우
1.  **재고 조사 (Inventory)**: 빌드 도구, 패키지 매니저, 최상위 디렉토리 구성을 통해 프로젝트의 정체성과 기술 스택 파악.
2.  **진입점 발견 (Entry Points)**: 애플리케이션의 시작점(Main 함수, API 라우터, CLI 핸들러 등) 식별.
3.  **실행 흐름 추적 (Tracing)**: 특정 요청이 시스템을 관통하여 처리되고 저장되는 전 과정을 끝에서 끝까지(End-to-End) 추적.
4.  **경계 및 책임 분석 (Boundaries)**: 모듈 간의 접점(API, 인터페이스)을 식별하고 각 컴포넌트의 역할과 책임 구분.

## 3. 핵심 학습 전략
- **하향식(Top-down) & 상향식(Bottom-up) 병행**: 비즈니스 가치 중심의 전체 흐름 파악과 데이터베이스 스키마 중심의 기술적 제약 파악을 교차 검증.
- **작은 작업부터 시작**: 문서 오타 수정, UI 텍스트 변경, 간단한 단위 테스트 작성 등 위험도가 낮은 작업부터 시작하여 시스템 지식을 안전하게 확장.
- **동적 분석 활용**: 정적 코드 읽기에 그치지 않고, 로컬 환경에서 디버거(중단점)와 로그를 활용해 런타임 동작을 직접 관찰.

## 4. 트레이드오프 및 주의사항
- **완벽주의 경계**: 모든 코드를 이해한 후 작업을 시작하려 하지 말고, 파편화된 정보를 연결하며 실행 가능한 코드부터 작성할 것.
- **문서의 불완전성 인정**: 주석이나 문서는 구현체와 동기화되지 않았을 가능성이 높으므로, 항상 실제 코드와 테스트 결과를 최종 진실로 삼을 것.

## 5. 지식 연결 (Related)
- [[Router_Implementation]]: 시스템의 진입점으로서 라우터를 분석하는 방법.
- [[C4_Modeling_Framework]]: 온보딩 과정에서 습득한 정보를 시각화하는 표준 모델.

## 🧪 검증 상태 (Validation)
- **정보 상태**: 검증 완료 (Verified)
- **출처 신뢰도**: A
- **검토 이유**: 신규 팀원의 생산성을 조기에 확보하고 지식 전파의 비용을 낮추기 위한 실천적 가이드라인 정립.

## 📌 한 줄 통찰
> **"매 4-step (Inventory → Entry Points → Tracing → Boundaries) + 매 small first task"**. 매 perfectionism 의 reject — 매 fragmented info 의 connect 의 시작. 매 modern: 매 LLM-aided onboarding (RAG + repo).

## 📖 핵심 (간략)
- 매 4-step workflow.
- 매 top-down × bottom-up cross-check.
- 매 small task 의 risk-low 의 시작.
- 매 dynamic analysis (debugger, log).
- 매 doc 의 불완전 — 매 code + test 의 truth.

## 🤖 LLM 활용
**언제**: 매 new joiner. 매 codebase migration. 매 acquisition tech due diligence. 매 LLM-aided onboarding RAG.
**언제 X**: 매 single-script project.

## 🔗 지식 연결
- Adjacent: [[Asset-Specific-Knowledge]] · [[C4_Model]] · [[Software Architecture Styles]] · [[Bounded Contexts (DDD)]] · [[CodeScene]]

## 💻 패턴

### Day 1 inventory
```bash
# 매 stack identify
ls package.json pyproject.toml go.mod Cargo.toml pom.xml 2>/dev/null
ls Dockerfile docker-compose.yml .github/workflows/ 2>/dev/null
ls README.md docs/ ARCHITECTURE.md 2>/dev/null
ls -la  # 매 hidden config

# 매 size + language breakdown
cloc .  # 매 lines per language

# 매 directory tree (top 2 level)
tree -L 2 -I node_modules
```

### Find entry points
```bash
# 매 main / index 의 search
grep -rn "if __name__" --include="*.py"  # 매 Python entry
grep -rn "func main" --include="*.go"     # 매 Go
grep -rn "fn main" --include="*.rs"       # 매 Rust
find . -name "index.ts" -o -name "main.ts" -o -name "server.ts"

# 매 routing
grep -rn "@app.route\|app.get\|router.get" --include="*.{py,ts,js}"
```

### Trace one request (E2E)
```
1. HTTP request → router.
2. → controller / handler.
3. → service / use case.
4. → repository / data layer.
5. → DB query.
6. → response back up.

매 each layer 의 break point 의 set.
매 매 layer 의 transformation 의 observe.
```

### Small first task (low-risk)
```
1. Fix a typo in README / docs.
2. Update outdated dependency (patch).
3. Add a unit test for existing function.
4. Improve error message clarity.
5. Add a log line.
6. Refactor a small private function.

→ 매 PR + review + merge 의 cycle 의 학습.
```

### LLM-aided onboarding RAG
```python
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 매 codebase + ADR + README + commit history 의 index
sources = ['src/**/*.{ts,py}', 'docs/**/*.md', 'adr/*.md', 'README.md']
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
vectordb = Chroma.from_documents(load_docs(sources), embeddings, persist_directory='./onboarding-rag')

def ask(q):
    return llm.generate_with_context(q, vectordb.similarity_search(q, k=5))

# 매 examples
ask("Where does authentication happen?")
ask("What's the data flow for a user signup?")
ask("Why was Postgres chosen over MongoDB? (ADR)")
```

### Architecture map (output)
```
Codebase: [name]
Stack: [TS / Node / Postgres / Redis]
Entry points:
  - HTTP API: [src/server.ts:42]
  - CLI: [src/cli.ts:15]
  - Worker: [src/worker.ts]
Layers (Clean Arch):
  - domain/  (entity, value object)
  - application/  (use case)
  - infrastructure/  (db, http, queue)
External deps:
  - Stripe: [src/infrastructure/stripe/]
  - Slack: [src/infrastructure/slack/]
Open questions:
  - How does retry work for Stripe webhook?
  - Why is OrderService split across 2 modules?
```

## 🤔 결정 기준
| 상황 | Approach |
|---|---|
| Tiny (<10 file) | Read all |
| Small (<500 file) | 4-step + small task |
| Large (10K+ file) | RAG-aided + bounded-context-by-bounded-context |
| Legacy unknown | CodeScene hotspot first |
| Greenfield | Owner walkthrough |

**기본값**: 매 4-step + 매 RAG + 매 small task within week 1.

## ❌ 안티패턴
- **Read everything 의 perfectionism**: 매 paralysis.
- **No small task**: 매 actual learn X.
- **Doc 의 100% trust**: 매 stale.
- **No question journal**: 매 forget.
- **Skip dynamic analysis**: 매 runtime mismatch.

## 🕓 변경 이력
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — 4-step + 매 inventory / RAG / small task code |