---
id: wiki-2026-0508-deep-grammar
title: Deep Grammar
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [deep grammar, generative grammar, Chomsky hierarchy, universal grammar, syntactic structures]
duplicate_of: none
source_trust_level: A
confidence_score: 0.88
verification_status: applied
tags: [linguistics, chomsky, generative-grammar, syntax, nlp, formal-language]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: NLP / Formal Language
  applicable_to: [Linguistics, Compiler, NLP, LLM]
---

# Deep Grammar

## 매 한 줄
> **"매 surface sentence 의 underlying structure"**. Chomsky 의 generative grammar — 매 finite rule 의 infinite sentence 의 produce. 매 deep structure (meaning) ↔ surface structure (form). 매 modern: 매 LLM 의 implicit 의 learn (no explicit grammar).

## 매 핵심

### 매 Chomsky hierarchy
1. **Type 0** (Recursively enumerable): 매 Turing-complete.
2. **Type 1** (Context-sensitive): 매 a^n b^n c^n.
3. **Type 2** (Context-free): 매 programming language.
4. **Type 3** (Regular): 매 regex.

### 매 deep vs surface
- **Deep structure**: 매 meaning representation.
- **Surface**: 매 spoken / written form.
- **Transformation**: 매 active ↔ passive.

### 매 universal grammar (UG)
- 매 innate language faculty (Chomsky).
- 매 parameter setting (head-initial vs head-final).
- 매 critical period.

### 매 modern stance
- **Pre-LLM**: 매 explicit rule (CFG, dependency grammar).
- **Post-LLM**: 매 implicit (transformer 의 attention 의 learn).
- **Hybrid**: 매 LLM + grammar constraint (decoding).

### 매 응용
1. **Parsing**: 매 syntax tree.
2. **Compiler**: 매 BNF / EBNF.
3. **NLP**: 매 POS tag, dependency.
4. **Code completion**: 매 grammar-guided LLM.
5. **DSL**: 매 ANTLR / Tree-sitter.
6. **Constrained decoding**: 매 JSON schema 의 LLM.

## 💻 패턴

### CFG with NLTK
```python
import nltk
grammar = nltk.CFG.fromstring("""
    S -> NP VP
    NP -> Det N | Det N PP
    VP -> V NP | V NP PP
    PP -> P NP
    Det -> 'the' | 'a'
    N -> 'dog' | 'cat' | 'park'
    V -> 'saw' | 'chased'
    P -> 'in' | 'with'
""")
parser = nltk.ChartParser(grammar)
for tree in parser.parse('the dog saw a cat in the park'.split()):
    tree.pretty_print()
```

### Dependency parsing (spaCy)
```python
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("The cat sat on the mat.")
for token in doc:
    print(f'{token.text:10} {token.dep_:10} {token.head.text}')
```

### Tree-sitter grammar (DSL)
```javascript
module.exports = grammar({
  name: 'mylang',
  rules: {
    source_file: $ => repeat($._statement),
    _statement: $ => choice($.assignment, $.function_call),
    assignment: $ => seq($.identifier, '=', $._expression),
    identifier: $ => /[a-zA-Z_][a-zA-Z0-9_]*/,
    // ...
  }
});
```

### Constrained LLM decoding (grammar-guided)
```python
from outlines import models, generate
model = models.transformers('gpt2')

# 매 regex constraint
generator = generate.regex(model, r'\d{4}-\d{2}-\d{2}')
print(generator('Date: '))

# 매 JSON schema
from pydantic import BaseModel
class User(BaseModel):
    name: str
    age: int
gen = generate.json(model, User)
```

### PEG parser
```python
# parsimonious
from parsimonious.grammar import Grammar
grammar = Grammar(r"""
    expr     = term (("+" / "-") term)*
    term     = factor (("*" / "/") factor)*
    factor   = number / "(" expr ")"
    number   = ~"[0-9]+"
""")
tree = grammar.parse("3 + 4 * 2")
```

### Chomsky-Normal-Form CYK
```python
def cyk(words, grammar):
    n = len(words)
    table = [[set() for _ in range(n)] for _ in range(n)]
    for i, w in enumerate(words):
        for lhs, rhs in grammar:
            if rhs == (w,): table[i][i].add(lhs)
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            for k in range(i, j):
                for lhs, rhs in grammar:
                    if len(rhs) == 2 and rhs[0] in table[i][k] and rhs[1] in table[k+1][j]:
                        table[i][j].add(lhs)
    return 'S' in table[0][n-1]
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Programming language | CFG / PEG |
| NLP parsing | Dependency (spaCy) |
| LLM output structure | Constrained decoding |
| Custom DSL | Tree-sitter |
| Compiler frontend | ANTLR / yacc |
| Linguistics research | UG / minimalist |

**기본값**: 매 LLM era — 매 implicit grammar (transformer) + 매 constrained decoding 의 critical output.

## 🔗 Graph
- 변형: [[Generative-Grammar]] · [[Universal-Grammar]]
- 응용: [[Domain-Specific-Languages]] · [[NLP]]
- Adjacent: [[Transformer_Architecture_and_LLM_Foundations|LLM]]

## 🤖 LLM 활용
**언제**: 매 syntactic analysis. 매 grammar-guided generation. 매 DSL design.
**언제 X**: 매 free-form text. 매 zero-shot LLM.

## ❌ 안티패턴
- **Over-rigid grammar**: 매 LLM 의 advantage 의 lose.
- **Ignore ambiguity**: 매 parse multiple.
- **Deep ≠ semantic**: 매 modern view 의 separate.
- **No constraint at decode**: 매 invalid output.

## 🧪 검증 / 중복
- Verified (Chomsky, formal language theory).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto-reinforced |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Chomsky hierarchy + 매 NLTK / spaCy / tree-sitter / constrained decode code |