Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

4.9 KiB

Raw Permalink Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Sequence to Sequence Models

매 한 줄

"매 input sequence → output sequence — 매 길이 다른 변환". 매 Sutskever (2014) RNN encoder-decoder → Bahdanau (2015) attention → Vaswani (2017) Transformer 의 진화. 매 2026: 거의 모든 generative LLM (GPT, Claude, Gemini) 이 매 decoder-only seq2seq, 매 T5/BART 같은 encoder-decoder 는 specific task (번역, summarization fine-tune) 에 잔존.

매 핵심

매 Architecture family

RNN encoder-decoder (2014): 매 historical, vanishing gradient, no attention.
Attention seq2seq (2015): 매 alignment 학습 — 번역 quality 점프.
Transformer encoder-decoder (2017): 매 self-attention, parallelizable. T5, BART, mT5.
Decoder-only (2018+): GPT family. 매 LLM 의 dominant pattern.
Encoder-only (BERT): classification/embedding, generation 아님.

매 핵심 컴포넌트

Tokenizer (BPE, SentencePiece, tiktoken).
Embedding + positional encoding (RoPE, ALiBi 2026 표준).
Self-attention / cross-attention.
Teacher forcing for training, autoregressive decoding for inference.

매 Decoding 전략

Greedy / Beam search — 매 deterministic task.
Sampling (temperature, top-p, top-k, min-p) — 매 creative.
Speculative / Medusa — 매 inference 가속.
Constrained / structured (JSON schema) — 매 tool use.

매 응용

Machine translation (NLLB, M2M-100).
Summarization (BART, Pegasus).
Code generation (Claude Code, Copilot).
Speech (Whisper encoder + decoder).
Image captioning, VQA (multimodal seq2seq).

💻 패턴

Tiny Transformer encoder-decoder

import torch.nn as nn
class Seq2Seq(nn.Module):
    def __init__(self, vocab, d=256, nhead=4, nl=4):
        super().__init__()
        self.emb_s = nn.Embedding(vocab, d)
        self.emb_t = nn.Embedding(vocab, d)
        self.tx = nn.Transformer(d, nhead, nl, nl, batch_first=True)
        self.out = nn.Linear(d, vocab)
    def forward(self, src, tgt):
        return self.out(self.tx(self.emb_s(src), self.emb_t(tgt)))

HF Transformers (T5)

from transformers import T5ForConditionalGeneration, T5Tokenizer
tok = T5Tokenizer.from_pretrained("t5-base")
m = T5ForConditionalGeneration.from_pretrained("t5-base")
inp = tok("translate English to German: Hello world", return_tensors="pt").input_ids
print(tok.decode(m.generate(inp)[0], skip_special_tokens=True))

Decoder-only generation (Claude API)

import anthropic
c = anthropic.Anthropic()
msg = c.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize: ..."}],
)
print(msg.content[0].text)

Beam search decode

out = model.generate(input_ids, num_beams=4, length_penalty=0.6,
                     no_repeat_ngram_size=3, max_new_tokens=128)

Streaming

with c.messages.stream(model="claude-opus-4-7", max_tokens=512,
                       messages=msgs) as s:
    for text in s.text_stream:
        print(text, end="", flush=True)

KV cache reuse

out = model(**inputs, use_cache=True, past_key_values=pkv)
pkv = out.past_key_values  # 매 next step 에 재사용

매 결정 기준

상황	Approach
General LLM task	decoder-only (Claude, GPT)
Specific translation/summarization fine-tune	T5/BART encoder-decoder
Embedding / classification	encoder-only (BERT family)
Speech-to-text	Whisper-style enc-dec
Long sequences, low cost	Mamba / Hybrid seq2seq

기본값: decoder-only LLM via API.

🔗 Graph

부모: Deep Learning · NLP
변형: Transformer · Selective State Space Models (Mamba) · Encoder-Decoder
응용: Summarization · Code-Generation
Adjacent: Attention Mechanism · Tokenization

🤖 LLM 활용

언제: input → output 변환 task 정의 가능. 매 API call 로 충분. 언제 X: pure classification — encoder + head 가 매 더 cheap.

❌ 안티패턴

Greedy for creative: repetition. 매 sampling 사용.
No cache: O(L²) inference. 매 KV cache 필수.
Train from scratch: 매 거의 항상 잘못된 선택. Fine-tune 또는 prompt.

🧪 검증 / 중복

Verified (Sutskever 2014, Vaswani 2017, HF Transformers docs).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — full seq2seq family 2026

4.9 KiB Raw Permalink Blame History