--- id: wiki-2026-0508-tokenization-strategies title: Tokenization Strategies category: 10_Wiki/Topics status: duplicate canonical_id: tokenization-subword-processing duplicate_of: "[[Tokenization & Subword Processing]]" aliases: [] source_trust_level: A confidence_score: 0.9 verification_status: redirected tags: [duplicate, tokenization, nlp, bpe] last_reinforced: 2026-05-10 github_commit: pending --- # Tokenization Strategies > **이 문서는 [[Tokenization & Subword Processing]] 의 중복본입니다.** Canonical 문서로 redirect. ## 핵심 요약 - BPE, WordPiece, SentencePiece, Unigram LM 의 subword tokenization 전략들. - Canonical 문서가 algorithm details, vocab size tradeoff, multilingual considerations 를 다룸. - 2026: tiktoken (OpenAI), Claude tokenizer, Llama 3 tokenizer (128K vocab). ## 🔗 Graph - 부모: [[Tokenization & Subword Processing]] (canonical) ## 🕓 변경 이력 | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | 중복 처리 — canonical 문서로 redirect |