"매 surface sentence 의 underlying structure". Chomsky 의 generative grammar — 매 finite rule 의 infinite sentence 의 produce. 매 deep structure (meaning) ↔ surface structure (form). 매 modern: 매 LLM 의 implicit 의 learn (no explicit grammar).
매 핵심
매 Chomsky hierarchy
Type 0 (Recursively enumerable): 매 Turing-complete.
Type 1 (Context-sensitive): 매 a^n b^n c^n.
Type 2 (Context-free): 매 programming language.
Type 3 (Regular): 매 regex.
매 deep vs surface
Deep structure: 매 meaning representation.
Surface: 매 spoken / written form.
Transformation: 매 active ↔ passive.
매 universal grammar (UG)
매 innate language faculty (Chomsky).
매 parameter setting (head-initial vs head-final).
매 critical period.
매 modern stance
Pre-LLM: 매 explicit rule (CFG, dependency grammar).
Post-LLM: 매 implicit (transformer 의 attention 의 learn).
Hybrid: 매 LLM + grammar constraint (decoding).
매 응용
Parsing: 매 syntax tree.
Compiler: 매 BNF / EBNF.
NLP: 매 POS tag, dependency.
Code completion: 매 grammar-guided LLM.
DSL: 매 ANTLR / Tree-sitter.
Constrained decoding: 매 JSON schema 의 LLM.
💻 패턴
CFG with NLTK
importnltkgrammar=nltk.CFG.fromstring("""
S -> NP VP
NP -> Det N | Det N PP
VP -> V NP | V NP PP
PP -> P NP
Det -> 'the' | 'a'
N -> 'dog' | 'cat' | 'park'
V -> 'saw' | 'chased'
P -> 'in' | 'with'
""")parser=nltk.ChartParser(grammar)fortreeinparser.parse('the dog saw a cat in the park'.split()):tree.pretty_print()
Dependency parsing (spaCy)
importspacynlp=spacy.load('en_core_web_sm')doc=nlp("The cat sat on the mat.")fortokenindoc:print(f'{token.text:10}{token.dep_:10}{token.head.text}')
fromoutlinesimportmodels,generatemodel=models.transformers('gpt2')# 매 regex constraintgenerator=generate.regex(model,r'\d{4}-\d{2}-\d{2}')print(generator('Date: '))# 매 JSON schemafrompydanticimportBaseModelclassUser(BaseModel):name:strage:intgen=generate.json(model,User)
PEG parser
# parsimoniousfromparsimonious.grammarimportGrammargrammar=Grammar(r"""
expr = term (("+" / "-") term)*
term = factor (("*" / "/") factor)*
factor = number / "(" expr ")"
number = ~"[0-9]+"
""")tree=grammar.parse("3 + 4 * 2")