Files
2nd/10_Wiki/Topics/AI_and_ML/Deep-Grammar.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

185 lines
5.3 KiB
Markdown

---
id: wiki-2026-0508-deep-grammar
title: Deep Grammar
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [deep grammar, generative grammar, Chomsky hierarchy, universal grammar, syntactic structures]
duplicate_of: none
source_trust_level: A
confidence_score: 0.88
verification_status: applied
tags: [linguistics, chomsky, generative-grammar, syntax, nlp, formal-language]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: NLP / Formal Language
applicable_to: [Linguistics, Compiler, NLP, LLM]
---
# Deep Grammar
## 매 한 줄
> **"매 surface sentence 의 underlying structure"**. Chomsky 의 generative grammar — 매 finite rule 의 infinite sentence 의 produce. 매 deep structure (meaning) ↔ surface structure (form). 매 modern: 매 LLM 의 implicit 의 learn (no explicit grammar).
## 매 핵심
### 매 Chomsky hierarchy
1. **Type 0** (Recursively enumerable): 매 Turing-complete.
2. **Type 1** (Context-sensitive): 매 a^n b^n c^n.
3. **Type 2** (Context-free): 매 programming language.
4. **Type 3** (Regular): 매 regex.
### 매 deep vs surface
- **Deep structure**: 매 meaning representation.
- **Surface**: 매 spoken / written form.
- **Transformation**: 매 active ↔ passive.
### 매 universal grammar (UG)
- 매 innate language faculty (Chomsky).
- 매 parameter setting (head-initial vs head-final).
- 매 critical period.
### 매 modern stance
- **Pre-LLM**: 매 explicit rule (CFG, dependency grammar).
- **Post-LLM**: 매 implicit (transformer 의 attention 의 learn).
- **Hybrid**: 매 LLM + grammar constraint (decoding).
### 매 응용
1. **Parsing**: 매 syntax tree.
2. **Compiler**: 매 BNF / EBNF.
3. **NLP**: 매 POS tag, dependency.
4. **Code completion**: 매 grammar-guided LLM.
5. **DSL**: 매 ANTLR / Tree-sitter.
6. **Constrained decoding**: 매 JSON schema 의 LLM.
## 💻 패턴
### CFG with NLTK
```python
import nltk
grammar = nltk.CFG.fromstring("""
S -> NP VP
NP -> Det N | Det N PP
VP -> V NP | V NP PP
PP -> P NP
Det -> 'the' | 'a'
N -> 'dog' | 'cat' | 'park'
V -> 'saw' | 'chased'
P -> 'in' | 'with'
""")
parser = nltk.ChartParser(grammar)
for tree in parser.parse('the dog saw a cat in the park'.split()):
tree.pretty_print()
```
### Dependency parsing (spaCy)
```python
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("The cat sat on the mat.")
for token in doc:
print(f'{token.text:10} {token.dep_:10} {token.head.text}')
```
### Tree-sitter grammar (DSL)
```javascript
module.exports = grammar({
name: 'mylang',
rules: {
source_file: $ => repeat($._statement),
_statement: $ => choice($.assignment, $.function_call),
assignment: $ => seq($.identifier, '=', $._expression),
identifier: $ => /[a-zA-Z_][a-zA-Z0-9_]*/,
// ...
}
});
```
### Constrained LLM decoding (grammar-guided)
```python
from outlines import models, generate
model = models.transformers('gpt2')
# 매 regex constraint
generator = generate.regex(model, r'\d{4}-\d{2}-\d{2}')
print(generator('Date: '))
# 매 JSON schema
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
gen = generate.json(model, User)
```
### PEG parser
```python
# parsimonious
from parsimonious.grammar import Grammar
grammar = Grammar(r"""
expr = term (("+" / "-") term)*
term = factor (("*" / "/") factor)*
factor = number / "(" expr ")"
number = ~"[0-9]+"
""")
tree = grammar.parse("3 + 4 * 2")
```
### Chomsky-Normal-Form CYK
```python
def cyk(words, grammar):
n = len(words)
table = [[set() for _ in range(n)] for _ in range(n)]
for i, w in enumerate(words):
for lhs, rhs in grammar:
if rhs == (w,): table[i][i].add(lhs)
for length in range(2, n + 1):
for i in range(n - length + 1):
j = i + length - 1
for k in range(i, j):
for lhs, rhs in grammar:
if len(rhs) == 2 and rhs[0] in table[i][k] and rhs[1] in table[k+1][j]:
table[i][j].add(lhs)
return 'S' in table[0][n-1]
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Programming language | CFG / PEG |
| NLP parsing | Dependency (spaCy) |
| LLM output structure | Constrained decoding |
| Custom DSL | Tree-sitter |
| Compiler frontend | ANTLR / yacc |
| Linguistics research | UG / minimalist |
**기본값**: 매 LLM era — 매 implicit grammar (transformer) + 매 constrained decoding 의 critical output.
## 🔗 Graph
- 변형: [[Generative-Grammar]] · [[Universal-Grammar]]
- 응용: [[Domain-Specific-Languages]] · [[NLP]]
- Adjacent: [[Transformer_Architecture_and_LLM_Foundations|LLM]]
## 🤖 LLM 활용
**언제**: 매 syntactic analysis. 매 grammar-guided generation. 매 DSL design.
**언제 X**: 매 free-form text. 매 zero-shot LLM.
## ❌ 안티패턴
- **Over-rigid grammar**: 매 LLM 의 advantage 의 lose.
- **Ignore ambiguity**: 매 parse multiple.
- **Deep ≠ semantic**: 매 modern view 의 separate.
- **No constraint at decode**: 매 invalid output.
## 🧪 검증 / 중복
- Verified (Chomsky, formal language theory).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto-reinforced |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Chomsky hierarchy + 매 NLTK / spaCy / tree-sitter / constrained decode code |