Files
2nd/10_Wiki/Topics/AI_and_ML/Sequence-to-Sequence-Models.md
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

4.9 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-sequence-to-sequence-models Sequence to Sequence Models 10_Wiki/Topics verified self
seq2seq
Encoder-Decoder
Sequence Modeling
Sequence-to-Sequence
none A 0.9 applied
architecture
nlp
transformer
encoder-decoder
2026-05-10 pending
language framework
python PyTorch / Transformers

Sequence to Sequence Models

매 한 줄

"매 input sequence → output sequence — 매 길이 다른 변환". 매 Sutskever (2014) RNN encoder-decoder → Bahdanau (2015) attention → Vaswani (2017) Transformer 의 진화. 매 2026: 거의 모든 generative LLM (GPT, Claude, Gemini) 이 매 decoder-only seq2seq, 매 T5/BART 같은 encoder-decoder 는 specific task (번역, summarization fine-tune) 에 잔존.

매 핵심

매 Architecture family

  • RNN encoder-decoder (2014): 매 historical, vanishing gradient, no attention.
  • Attention seq2seq (2015): 매 alignment 학습 — 번역 quality 점프.
  • Transformer encoder-decoder (2017): 매 self-attention, parallelizable. T5, BART, mT5.
  • Decoder-only (2018+): GPT family. 매 LLM 의 dominant pattern.
  • Encoder-only (BERT): classification/embedding, generation 아님.

매 핵심 컴포넌트

  • Tokenizer (BPE, SentencePiece, tiktoken).
  • Embedding + positional encoding (RoPE, ALiBi 2026 표준).
  • Self-attention / cross-attention.
  • Teacher forcing for training, autoregressive decoding for inference.

매 Decoding 전략

  • Greedy / Beam search — 매 deterministic task.
  • Sampling (temperature, top-p, top-k, min-p) — 매 creative.
  • Speculative / Medusa — 매 inference 가속.
  • Constrained / structured (JSON schema) — 매 tool use.

매 응용

  1. Machine translation (NLLB, M2M-100).
  2. Summarization (BART, Pegasus).
  3. Code generation (Claude Code, Copilot).
  4. Speech (Whisper encoder + decoder).
  5. Image captioning, VQA (multimodal seq2seq).

💻 패턴

Tiny Transformer encoder-decoder

import torch.nn as nn
class Seq2Seq(nn.Module):
    def __init__(self, vocab, d=256, nhead=4, nl=4):
        super().__init__()
        self.emb_s = nn.Embedding(vocab, d)
        self.emb_t = nn.Embedding(vocab, d)
        self.tx = nn.Transformer(d, nhead, nl, nl, batch_first=True)
        self.out = nn.Linear(d, vocab)
    def forward(self, src, tgt):
        return self.out(self.tx(self.emb_s(src), self.emb_t(tgt)))

HF Transformers (T5)

from transformers import T5ForConditionalGeneration, T5Tokenizer
tok = T5Tokenizer.from_pretrained("t5-base")
m = T5ForConditionalGeneration.from_pretrained("t5-base")
inp = tok("translate English to German: Hello world", return_tensors="pt").input_ids
print(tok.decode(m.generate(inp)[0], skip_special_tokens=True))

Decoder-only generation (Claude API)

import anthropic
c = anthropic.Anthropic()
msg = c.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize: ..."}],
)
print(msg.content[0].text)

Beam search decode

out = model.generate(input_ids, num_beams=4, length_penalty=0.6,
                     no_repeat_ngram_size=3, max_new_tokens=128)

Streaming

with c.messages.stream(model="claude-opus-4-7", max_tokens=512,
                       messages=msgs) as s:
    for text in s.text_stream:
        print(text, end="", flush=True)

KV cache reuse

out = model(**inputs, use_cache=True, past_key_values=pkv)
pkv = out.past_key_values  # 매 next step 에 재사용

매 결정 기준

상황 Approach
General LLM task decoder-only (Claude, GPT)
Specific translation/summarization fine-tune T5/BART encoder-decoder
Embedding / classification encoder-only (BERT family)
Speech-to-text Whisper-style enc-dec
Long sequences, low cost Mamba / Hybrid seq2seq

기본값: decoder-only LLM via API.

🔗 Graph

🤖 LLM 활용

언제: input → output 변환 task 정의 가능. 매 API call 로 충분. 언제 X: pure classification — encoder + head 가 매 더 cheap.

안티패턴

  • Greedy for creative: repetition. 매 sampling 사용.
  • No cache: O(L²) inference. 매 KV cache 필수.
  • Train from scratch: 매 거의 항상 잘못된 선택. Fine-tune 또는 prompt.

🧪 검증 / 중복

  • Verified (Sutskever 2014, Vaswani 2017, HF Transformers docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full seq2seq family 2026