--- id: wiki-2026-0508-abstract-syntax-tree title: Abstract Syntax Tree (AST) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AST, Syntax Tree, Parse Tree (informal)] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [compiler, parsing, ast, language-tooling, static-analysis] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python/JavaScript/Rust framework: ast/Babel/swc/tree-sitter --- # Abstract Syntax Tree (AST) ## 매 한 줄 > **"매 source code 의 tree shape, syntax noise 의 strip"**. AST = parser 의 output, 매 token sequence 의 hierarchical node tree (FunctionDecl, BinaryExpr, ...) 로 변환. 매 compiler/linter/formatter/codemod/LLM-codegen 의 foundation — 매 2026 LLM agentic coding 의 매 ground truth structural layer. ## 매 핵심 ### 매 vs Concrete Syntax Tree (CST) - **CST (parse tree)**: 매 every token (paren, semicolon, whitespace) 의 retain. - **AST**: 매 semantically meaningful node only — 매 cosmetic 의 drop. - 매 modern formatter (Prettier, rustfmt) 의 CST-like (lossless) 의 use, 매 compiler/codemod 의 AST. ### 매 node anatomy - **type**: `BinaryExpression`, `FunctionDeclaration`, ... - **children**: structured field (`left`, `right`, `body`, `params`). - **location**: `start`/`end` byte offset + line/col — 매 error message + source map. ### 매 typical pipeline 1. **Lex** → token stream. 2. **Parse** → AST. 3. **Analyze** (type check, scope resolve). 4. **Transform** (optimize, lower). 5. **Emit** (codegen, print). ### 매 응용 1. Compiler (rustc, tsc, clang) 의 IR upstream. 2. Linter (ESLint, ruff, clippy) — 매 rule = AST pattern match. 3. Formatter (Prettier, Black, gofmt). 4. Codemod (jscodeshift, ts-morph, libcst) — 매 large refactor. 5. LLM agentic coding (Claude Opus 4.7 의 tree-sitter grounding). 6. Static analysis / SAST (Semgrep, CodeQL). 7. IDE (LSP, syntax highlight, jump-to-def). ## 💻 패턴 ### Python `ast` — visit + transform ```python import ast src = "x = 1 + 2 * 3" tree = ast.parse(src) class ConstFold(ast.NodeTransformer): def visit_BinOp(self, node: ast.BinOp): self.generic_visit(node) if isinstance(node.left, ast.Constant) and isinstance(node.right, ast.Constant): try: return ast.copy_location(ast.Constant(value=eval(compile(ast.Expression(node), "", "eval"))), node) except Exception: pass return node new = ast.fix_missing_locations(ConstFold().visit(tree)) print(ast.unparse(new)) # x = 7 ``` ### tree-sitter (multi-language, incremental) ```python from tree_sitter import Language, Parser import tree_sitter_python as tspy PY = Language(tspy.language()) parser = Parser(PY) tree = parser.parse(b"def add(a, b):\n return a + b\n") root = tree.root_node for n in root.children: print(n.type, n.start_point, n.end_point) ``` ### Babel codemod (JS/TS) ```js import * as t from "@babel/types"; import generate from "@babel/generator"; import { parse } from "@babel/parser"; import traverse from "@babel/traverse"; const ast = parse(`var x = 1;`, { sourceType: "module" }); traverse(ast, { VariableDeclaration(path) { if (path.node.kind === "var") path.node.kind = "const"; }, }); console.log(generate(ast).code); // const x = 1; ``` ### Rust `syn` — proc macro ```rust use syn::{parse_quote, ItemFn}; use quote::quote; let f: ItemFn = parse_quote! { fn greet() { println!("hi"); } }; let name = &f.sig.ident; let out = quote! { #f impl Greeter for () { fn name() -> &'static str { stringify!(#name) } } }; ``` ### Pattern match (Semgrep-style) ```yaml rules: - id: dangerous-eval pattern: eval($X) message: avoid eval languages: [python] severity: ERROR ``` ### LLM-grounded edit (2026) ```python # 매 LLM 의 line-range edit instead of free-form rewrite — AST 의 anchor edit = {"file": "app.py", "node_path": "Module/FunctionDef[name=handler]/body[2]", "replace_with": "return JSONResponse({'ok': True})"} apply_ast_edit(edit) # 매 syntactic safety guaranteed ``` ## 매 결정 기준 | 상황 | Tool | |---|---| | Single-language Python script tooling | `ast` (stdlib) | | Multi-language, incremental (editor) | tree-sitter | | JS/TS large codemod | jscodeshift / ts-morph | | Python lossless refactor (preserves comments) | LibCST | | Compiler frontend, type-aware codemod | language native (rustc API, tsc API) | | Cross-repo security scan | Semgrep / CodeQL | **기본값**: 매 cross-language tooling — tree-sitter. Python-only — `ast` + LibCST. ## 🔗 Graph - 변형: [[Concrete Syntax Tree]] - 응용: [[Linter]] · [[Static Analysis]] - Adjacent: [[Visitor Pattern]] ## 🤖 LLM 활용 **언제**: 매 codemod, 매 lint rule, 매 LLM-output 의 syntactic validation, 매 IDE refactor. **언제 X**: 매 trivial regex match 의 sufficient case (e.g. `TODO` find). ## ❌ 안티패턴 - **Regex 의 code 의 parse**: 매 nested/quoted/comment 의 always break — 매 AST 의 use. - **Mutate while iterating**: 매 child traversal 중 parent mutate — 매 transformer pattern (return new node). - **Lose source location**: 매 error message 의 useless 의 됨 — 매 location preserve. - **Print round-trip 의 trust**: 매 unparse 의 lossy (whitespace, comment) — 매 LibCST/Prettier 의 use. ## 🧪 검증 / 중복 - Verified (Aho et al. *Dragon Book* 2nd ed; Python ast docs; tree-sitter docs 2025; Babel handbook). - 신뢰도 A. - AST(Abstract_Syntax_Tree).md 의 redirect. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — canonical AST 문서, tree-sitter/LLM-grounded edit 추가 |