Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

7.2 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

코드베이스 온보딩 실전 가이드 (Codebase Onboarding Guide)

1. 개요

코드베이스 온보딩은 새로운 개발자가 낯선 시스템의 구조와 동작 방식을 파악하여 실질적인 기여자가 되는 과정이다. 수백만 줄의 코드를 한 번에 읽으려는 무리한 시도 대신, 시스템의 핵심 지형을 파악하고 점진적으로 지식을 확장하는 전략적 접근이 필요하다.

2. 온보딩 4단계 워크플로우

재고 조사 (Inventory): 빌드 도구, 패키지 매니저, 최상위 디렉토리 구성을 통해 프로젝트의 정체성과 기술 스택 파악.
진입점 발견 (Entry Points): 애플리케이션의 시작점(Main 함수, API 라우터, CLI 핸들러 등) 식별.
실행 흐름 추적 (Tracing): 특정 요청이 시스템을 관통하여 처리되고 저장되는 전 과정을 끝에서 끝까지(End-to-End) 추적.
경계 및 책임 분석 (Boundaries): 모듈 간의 접점(API, 인터페이스)을 식별하고 각 컴포넌트의 역할과 책임 구분.

3. 핵심 학습 전략

하향식(Top-down) & 상향식(Bottom-up) 병행: 비즈니스 가치 중심의 전체 흐름 파악과 데이터베이스 스키마 중심의 기술적 제약 파악을 교차 검증.
작은 작업부터 시작: 문서 오타 수정, UI 텍스트 변경, 간단한 단위 테스트 작성 등 위험도가 낮은 작업부터 시작하여 시스템 지식을 안전하게 확장.
동적 분석 활용: 정적 코드 읽기에 그치지 않고, 로컬 환경에서 디버거(중단점)와 로그를 활용해 런타임 동작을 직접 관찰.

4. 트레이드오프 및 주의사항

완벽주의 경계: 모든 코드를 이해한 후 작업을 시작하려 하지 말고, 파편화된 정보를 연결하며 실행 가능한 코드부터 작성할 것.
문서의 불완전성 인정: 주석이나 문서는 구현체와 동기화되지 않았을 가능성이 높으므로, 항상 실제 코드와 테스트 결과를 최종 진실로 삼을 것.

Router_Implementation: 시스템의 진입점으로서 라우터를 분석하는 방법.
C4_Modeling_Framework: 온보딩 과정에서 습득한 정보를 시각화하는 표준 모델.

🧪 검증 상태 (Validation)

정보 상태: 검증 완료 (Verified)
출처 신뢰도: A
검토 이유: 신규 팀원의 생산성을 조기에 확보하고 지식 전파의 비용을 낮추기 위한 실천적 가이드라인 정립.

📌 한 줄 통찰

"매 4-step (Inventory → Entry Points → Tracing → Boundaries) + 매 small first task". 매 perfectionism 의 reject — 매 fragmented info 의 connect 의 시작. 매 modern: 매 LLM-aided onboarding (RAG + repo).

📖 핵심 (간략)

매 4-step workflow.
매 top-down × bottom-up cross-check.
매 small task 의 risk-low 의 시작.
매 dynamic analysis (debugger, log).
매 doc 의 불완전 — 매 code + test 의 truth.

🤖 LLM 활용

언제: 매 new joiner. 매 codebase migration. 매 acquisition tech due diligence. 매 LLM-aided onboarding RAG. 언제 X: 매 single-script project.

🔗 지식 연결

Adjacent: Asset-Specific-Knowledge · C4_Model · Software Architecture Styles · Bounded Contexts (DDD) · CodeScene

💻 패턴

Day 1 inventory

# 매 stack identify
ls package.json pyproject.toml go.mod Cargo.toml pom.xml 2>/dev/null
ls Dockerfile docker-compose.yml .github/workflows/ 2>/dev/null
ls README.md docs/ ARCHITECTURE.md 2>/dev/null
ls -la  # 매 hidden config

# 매 size + language breakdown
cloc .  # 매 lines per language

# 매 directory tree (top 2 level)
tree -L 2 -I node_modules

Find entry points

# 매 main / index 의 search
grep -rn "if __name__" --include="*.py"  # 매 Python entry
grep -rn "func main" --include="*.go"     # 매 Go
grep -rn "fn main" --include="*.rs"       # 매 Rust
find . -name "index.ts" -o -name "main.ts" -o -name "server.ts"

# 매 routing
grep -rn "@app.route\|app.get\|router.get" --include="*.{py,ts,js}"

Trace one request (E2E)

1. HTTP request → router.
2. → controller / handler.
3. → service / use case.
4. → repository / data layer.
5. → DB query.
6. → response back up.

매 each layer 의 break point 의 set.
매 매 layer 의 transformation 의 observe.

Small first task (low-risk)

1. Fix a typo in README / docs.
2. Update outdated dependency (patch).
3. Add a unit test for existing function.
4. Improve error message clarity.
5. Add a log line.
6. Refactor a small private function.

→ 매 PR + review + merge 의 cycle 의 학습.

LLM-aided onboarding RAG

from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 매 codebase + ADR + README + commit history 의 index
sources = ['src/**/*.{ts,py}', 'docs/**/*.md', 'adr/*.md', 'README.md']
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
vectordb = Chroma.from_documents(load_docs(sources), embeddings, persist_directory='./onboarding-rag')

def ask(q):
    return llm.generate_with_context(q, vectordb.similarity_search(q, k=5))

# 매 examples
ask("Where does authentication happen?")
ask("What's the data flow for a user signup?")
ask("Why was Postgres chosen over MongoDB? (ADR)")

Architecture map (output)

Codebase: [name]
Stack: [TS / Node / Postgres / Redis]
Entry points:
  - HTTP API: [src/server.ts:42]
  - CLI: [src/cli.ts:15]
  - Worker: [src/worker.ts]
Layers (Clean Arch):
  - domain/  (entity, value object)
  - application/  (use case)
  - infrastructure/  (db, http, queue)
External deps:
  - Stripe: [src/infrastructure/stripe/]
  - Slack: [src/infrastructure/slack/]
Open questions:
  - How does retry work for Stripe webhook?
  - Why is OrderService split across 2 modules?

🤔 결정 기준

상황	Approach
Tiny (<10 file)	Read all
Small (<500 file)	4-step + small task
Large (10K+ file)	RAG-aided + bounded-context-by-bounded-context
Legacy unknown	CodeScene hotspot first
Greenfield	Owner walkthrough

기본값: 매 4-step + 매 RAG + 매 small task within week 1.

❌ 안티패턴

Read everything 의 perfectionism: 매 paralysis.
No small task: 매 actual learn X.
Doc 의 100% trust: 매 stale.
No question journal: 매 forget.
Skip dynamic analysis: 매 runtime mismatch.

🕓 변경 이력

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — 4-step + 매 inventory / RAG / small task code

7.2 KiB Raw Blame History Unescape Escape