"매 input sequence → output sequence — 매 길이 다른 변환". 매 Sutskever (2014) RNN encoder-decoder → Bahdanau (2015) attention → Vaswani (2017) Transformer 의 진화. 매 2026: 거의 모든 generative LLM (GPT, Claude, Gemini) 이 매 decoder-only seq2seq, 매 T5/BART 같은 encoder-decoder 는 specific task (번역, summarization fine-tune) 에 잔존.
매 핵심
매 Architecture family
RNN encoder-decoder (2014): 매 historical, vanishing gradient, no attention.
Attention seq2seq (2015): 매 alignment 학습 — 번역 quality 점프.
Transformer encoder-decoder (2017): 매 self-attention, parallelizable. T5, BART, mT5.
Decoder-only (2018+): GPT family. 매 LLM 의 dominant pattern.
fromtransformersimportT5ForConditionalGeneration,T5Tokenizertok=T5Tokenizer.from_pretrained("t5-base")m=T5ForConditionalGeneration.from_pretrained("t5-base")inp=tok("translate English to German: Hello world",return_tensors="pt").input_idsprint(tok.decode(m.generate(inp)[0],skip_special_tokens=True))