Files
2nd/10_Wiki/Topics/Architecture/Software_Architecture_Recovery.md
T
2026-05-10 22:08:15 +09:00

5.3 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-software-architecture-recovery Software Architecture Recovery 10_Wiki/Topics verified self
Architecture Recovery
Reverse Architecting
none A 0.85 applied
architecture
reverse-engineering
legacy
static-analysis
2026-05-10 pending
language framework
python networkx

Software Architecture Recovery

매 한 줄

"매 source code → 매 architectural model 의 inference". Documentation 의 lost / outdated 의 legacy system 의 understanding. 2026 현재 매 LLM (Claude Opus 4.7, GPT-5) 의 augmented static-analysis 가 매 dominant — 매 dependency graph + cluster + LLM-named module summary.

매 핵심

매 phases

  1. Extraction: 매 source code, build files, config 의 parse → entities (file, class, module).
  2. Abstraction: 매 dependency graph, call graph, data-flow.
  3. Clustering: 매 community detection (Louvain, label propagation), 매 LLM semantic grouping.
  4. Presentation: C4 diagram, dependency matrix, ADR.

매 techniques

  • Static: AST parse, import graph (madge, jdeps, pyan).
  • Dynamic: trace logs, profilers, distributed tracing (OTel).
  • Hybrid: 매 static + runtime call data merge.
  • LLM-augmented: 매 module 별 README/code → 매 LLM summary, 매 architecture description.

매 응용

  1. Legacy modernization assessment.
  2. Microservice decomposition planning.
  3. Onboarding new engineers.

💻 패턴

Python — import graph 의 추출

import ast, os, networkx as nx
G = nx.DiGraph()
for root, _, files in os.walk("src"):
    for f in files:
        if not f.endswith(".py"): continue
        path = os.path.join(root, f)
        tree = ast.parse(open(path).read())
        mod = path.replace("/", ".").removesuffix(".py")
        for node in ast.walk(tree):
            if isinstance(node, ast.ImportFrom) and node.module:
                G.add_edge(mod, node.module)

JavaScript — madge dependency graph

npx madge --image graph.svg --extensions ts,tsx src/
npx madge --circular src/  # detect cycles

Java — jdeps + GraalVM

jdeps -verbose:class -recursive app.jar > deps.txt
jdeps --inverse --package com.acme.payment app.jar

Community detection (Louvain)

import networkx as nx
from networkx.algorithms.community import louvain_communities
modules = louvain_communities(G.to_undirected(), resolution=1.2, seed=42)
for i, m in enumerate(modules):
    print(f"Module {i}: {sorted(m)[:5]}...")

LLM-augmented module naming (Claude Opus 4.7)

from anthropic import Anthropic
client = Anthropic()
def name_module(files: list[str], code_snippets: list[str]) -> str:
    msg = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=200,
        messages=[{"role": "user", "content":
            f"Files: {files}\n\nSnippets:\n{code_snippets}\n\n"
            "Give a 3-word module name + 1-line responsibility."}],
    )
    return msg.content[0].text

Runtime trace → architecture (OpenTelemetry)

# Aggregate spans into service-level call graph
from collections import Counter
edges = Counter()
for span in fetch_traces(service="checkout", since="24h"):
    if span.parent and span.parent.service != span.service:
        edges[(span.parent.service, span.service)] += 1
# Top edges = primary architectural connections

C4 diagram emission (Structurizr DSL)

workspace {
  model {
    user = person "Customer"
    sys = softwareSystem "Shop" {
      web = container "Web"
      api = container "API"
      db = container "Postgres"
    }
    user -> web "uses"
    web -> api "REST"
    api -> db "JDBC"
  }
}

매 결정 기준

상황 Approach
Small monolith (<100k LoC) Static import graph + manual review
Microservices distributed Distributed tracing (OTel) + service map
Legacy COBOL/Java enterprise Lattix / Structure101 commercial tools
Quick high-level overview LLM (Opus 4.7) on README + top-level dirs
Decomposition planning Static + dynamic + LLM hybrid

기본값: 매 static import graph (madge / pyan / jdeps) → Louvain cluster → LLM name → C4 diagram.

🔗 Graph

🤖 LLM 활용

언제: 매 undocumented codebase 의 onboarding, 매 modernization plan, 매 dependency cycle 의 detect. 언제 X: 매 well-documented current arch — 매 ADR 의 read 의 충분.

안티패턴

  • Recovered = correct: 매 inferred architecture 는 매 historical, 매 ideal X. Validate with team.
  • Static only for distributed system: 매 runtime topology 의 lost.
  • LLM hallucination: 매 module name 의 plausible 의 X-correct. 매 verify.

🧪 검증 / 중복

  • Verified (Garlan & Schmerl SAR research, 20022024; SEI architecture reconstruction guides).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — recovery techniques with LLM-augmented analysis