Files
2nd/10_Wiki/Topics/AI_and_ML/Deep-Grammar.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.3 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-deep-grammar Deep Grammar 10_Wiki/Topics verified self
deep grammar
generative grammar
Chomsky hierarchy
universal grammar
syntactic structures
none A 0.88 applied
linguistics
chomsky
generative-grammar
syntax
nlp
formal-language
2026-05-10 pending
language applicable_to
NLP / Formal Language
Linguistics
Compiler
NLP
LLM

Deep Grammar

매 한 줄

"매 surface sentence 의 underlying structure". Chomsky 의 generative grammar — 매 finite rule 의 infinite sentence 의 produce. 매 deep structure (meaning) ↔ surface structure (form). 매 modern: 매 LLM 의 implicit 의 learn (no explicit grammar).

매 핵심

매 Chomsky hierarchy

  1. Type 0 (Recursively enumerable): 매 Turing-complete.
  2. Type 1 (Context-sensitive): 매 a^n b^n c^n.
  3. Type 2 (Context-free): 매 programming language.
  4. Type 3 (Regular): 매 regex.

매 deep vs surface

  • Deep structure: 매 meaning representation.
  • Surface: 매 spoken / written form.
  • Transformation: 매 active ↔ passive.

매 universal grammar (UG)

  • 매 innate language faculty (Chomsky).
  • 매 parameter setting (head-initial vs head-final).
  • 매 critical period.

매 modern stance

  • Pre-LLM: 매 explicit rule (CFG, dependency grammar).
  • Post-LLM: 매 implicit (transformer 의 attention 의 learn).
  • Hybrid: 매 LLM + grammar constraint (decoding).

매 응용

  1. Parsing: 매 syntax tree.
  2. Compiler: 매 BNF / EBNF.
  3. NLP: 매 POS tag, dependency.
  4. Code completion: 매 grammar-guided LLM.
  5. DSL: 매 ANTLR / Tree-sitter.
  6. Constrained decoding: 매 JSON schema 의 LLM.

💻 패턴

CFG with NLTK

import nltk
grammar = nltk.CFG.fromstring("""
    S -> NP VP
    NP -> Det N | Det N PP
    VP -> V NP | V NP PP
    PP -> P NP
    Det -> 'the' | 'a'
    N -> 'dog' | 'cat' | 'park'
    V -> 'saw' | 'chased'
    P -> 'in' | 'with'
""")
parser = nltk.ChartParser(grammar)
for tree in parser.parse('the dog saw a cat in the park'.split()):
    tree.pretty_print()

Dependency parsing (spaCy)

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("The cat sat on the mat.")
for token in doc:
    print(f'{token.text:10} {token.dep_:10} {token.head.text}')

Tree-sitter grammar (DSL)

module.exports = grammar({
  name: 'mylang',
  rules: {
    source_file: $ => repeat($._statement),
    _statement: $ => choice($.assignment, $.function_call),
    assignment: $ => seq($.identifier, '=', $._expression),
    identifier: $ => /[a-zA-Z_][a-zA-Z0-9_]*/,
    // ...
  }
});

Constrained LLM decoding (grammar-guided)

from outlines import models, generate
model = models.transformers('gpt2')

# 매 regex constraint
generator = generate.regex(model, r'\d{4}-\d{2}-\d{2}')
print(generator('Date: '))

# 매 JSON schema
from pydantic import BaseModel
class User(BaseModel):
    name: str
    age: int
gen = generate.json(model, User)

PEG parser

# parsimonious
from parsimonious.grammar import Grammar
grammar = Grammar(r"""
    expr     = term (("+" / "-") term)*
    term     = factor (("*" / "/") factor)*
    factor   = number / "(" expr ")"
    number   = ~"[0-9]+"
""")
tree = grammar.parse("3 + 4 * 2")

Chomsky-Normal-Form CYK

def cyk(words, grammar):
    n = len(words)
    table = [[set() for _ in range(n)] for _ in range(n)]
    for i, w in enumerate(words):
        for lhs, rhs in grammar:
            if rhs == (w,): table[i][i].add(lhs)
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            for k in range(i, j):
                for lhs, rhs in grammar:
                    if len(rhs) == 2 and rhs[0] in table[i][k] and rhs[1] in table[k+1][j]:
                        table[i][j].add(lhs)
    return 'S' in table[0][n-1]

매 결정 기준

상황 Approach
Programming language CFG / PEG
NLP parsing Dependency (spaCy)
LLM output structure Constrained decoding
Custom DSL Tree-sitter
Compiler frontend ANTLR / yacc
Linguistics research UG / minimalist

기본값: 매 LLM era — 매 implicit grammar (transformer) + 매 constrained decoding 의 critical output.

🔗 Graph

🤖 LLM 활용

언제: 매 syntactic analysis. 매 grammar-guided generation. 매 DSL design. 언제 X: 매 free-form text. 매 zero-shot LLM.

안티패턴

  • Over-rigid grammar: 매 LLM 의 advantage 의 lose.
  • Ignore ambiguity: 매 parse multiple.
  • Deep ≠ semantic: 매 modern view 의 separate.
  • No constraint at decode: 매 invalid output.

🧪 검증 / 중복

  • Verified (Chomsky, formal language theory).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-20 Auto-reinforced
2026-05-08 Phase 1
2026-05-10 Manual cleanup — Chomsky hierarchy + 매 NLTK / spaCy / tree-sitter / constrained decode code