--- id: wiki-2026-0508-software-architecture-recovery title: Software Architecture Recovery category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Architecture Recovery, Reverse Architecting] duplicate_of: none source_trust_level: A confidence_score: 0.85 verification_status: applied tags: [architecture, reverse-engineering, legacy, static-analysis] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: networkx --- # Software Architecture Recovery ## 매 한 줄 > **"매 source code → 매 architectural model 의 inference"**. Documentation 의 lost / outdated 의 legacy system 의 understanding. 2026 현재 매 LLM (Claude Opus 4.7, GPT-5) 의 augmented static-analysis 가 매 dominant — 매 dependency graph + cluster + LLM-named module summary. ## 매 핵심 ### 매 phases 1. **Extraction**: 매 source code, build files, config 의 parse → entities (file, class, module). 2. **Abstraction**: 매 dependency graph, call graph, data-flow. 3. **Clustering**: 매 community detection (Louvain, label propagation), 매 LLM semantic grouping. 4. **Presentation**: C4 diagram, dependency matrix, ADR. ### 매 techniques - **Static**: AST parse, import graph (madge, jdeps, pyan). - **Dynamic**: trace logs, profilers, distributed tracing (OTel). - **Hybrid**: 매 static + runtime call data merge. - **LLM-augmented**: 매 module 별 README/code → 매 LLM summary, 매 architecture description. ### 매 응용 1. Legacy modernization assessment. 2. Microservice decomposition planning. 3. Onboarding new engineers. ## 💻 패턴 ### Python — import graph 의 추출 ```python import ast, os, networkx as nx G = nx.DiGraph() for root, _, files in os.walk("src"): for f in files: if not f.endswith(".py"): continue path = os.path.join(root, f) tree = ast.parse(open(path).read()) mod = path.replace("/", ".").removesuffix(".py") for node in ast.walk(tree): if isinstance(node, ast.ImportFrom) and node.module: G.add_edge(mod, node.module) ``` ### JavaScript — madge dependency graph ```bash npx madge --image graph.svg --extensions ts,tsx src/ npx madge --circular src/ # detect cycles ``` ### Java — jdeps + GraalVM ```bash jdeps -verbose:class -recursive app.jar > deps.txt jdeps --inverse --package com.acme.payment app.jar ``` ### Community detection (Louvain) ```python import networkx as nx from networkx.algorithms.community import louvain_communities modules = louvain_communities(G.to_undirected(), resolution=1.2, seed=42) for i, m in enumerate(modules): print(f"Module {i}: {sorted(m)[:5]}...") ``` ### LLM-augmented module naming (Claude Opus 4.7) ```python from anthropic import Anthropic client = Anthropic() def name_module(files: list[str], code_snippets: list[str]) -> str: msg = client.messages.create( model="claude-opus-4-7", max_tokens=200, messages=[{"role": "user", "content": f"Files: {files}\n\nSnippets:\n{code_snippets}\n\n" "Give a 3-word module name + 1-line responsibility."}], ) return msg.content[0].text ``` ### Runtime trace → architecture (OpenTelemetry) ```python # Aggregate spans into service-level call graph from collections import Counter edges = Counter() for span in fetch_traces(service="checkout", since="24h"): if span.parent and span.parent.service != span.service: edges[(span.parent.service, span.service)] += 1 # Top edges = primary architectural connections ``` ### C4 diagram emission (Structurizr DSL) ```dsl workspace { model { user = person "Customer" sys = softwareSystem "Shop" { web = container "Web" api = container "API" db = container "Postgres" } user -> web "uses" web -> api "REST" api -> db "JDBC" } } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Small monolith (<100k LoC) | Static import graph + manual review | | Microservices distributed | Distributed tracing (OTel) + service map | | Legacy COBOL/Java enterprise | Lattix / Structure101 commercial tools | | Quick high-level overview | LLM (Opus 4.7) on README + top-level dirs | | Decomposition planning | Static + dynamic + LLM hybrid | **기본값**: 매 static import graph (madge / pyan / jdeps) → Louvain cluster → LLM name → C4 diagram. ## 🔗 Graph - 부모: [[Software Architecture]] - 응용: [[Legacy Modernization]] - Adjacent: [[C4 Model (Architecture Documentation)]] · [[Dependency Analysis]] · [[Static Analysis]] ## 🤖 LLM 활용 **언제**: 매 undocumented codebase 의 onboarding, 매 modernization plan, 매 dependency cycle 의 detect. **언제 X**: 매 well-documented current arch — 매 ADR 의 read 의 충분. ## ❌ 안티패턴 - **Recovered = correct**: 매 inferred architecture 는 매 historical, 매 ideal X. Validate with team. - **Static only for distributed system**: 매 runtime topology 의 lost. - **LLM hallucination**: 매 module name 의 plausible 의 X-correct. 매 verify. ## 🧪 검증 / 중복 - Verified (Garlan & Schmerl SAR research, 2002–2024; SEI architecture reconstruction guides). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — recovery techniques with LLM-augmented analysis |