--- id: html-charsets title: "HTML Charsets" category: "Frontend" status: "draft" verification_status: "conceptual" canonical_id: "" aliases: ["HTML character sets", "HTML encoding", "character encoding", "UTF-8", "ASCII", "ANSI", "ISO-8859-1", "charset meta"] duplicate_of: "" source_trust_level: "B" confidence_score: 0.88 created_at: 2026-06-23 updated_at: 2026-06-23 review_reason: "" merge_history: [] tags: ["html", "web", "frontend", "charset", "encoding", "utf-8", "w3schools"] raw_sources: ["https://www.w3schools.com/html/html_charset.asp"] applied_in: [] github_commit: "" --- # [[HTML Charsets]] ## 🎯 ν•œ 쀄 톡찰 (One-line insight) To display an HTML page correctly a browser must know which **character set (encoding)** the page uses; modern HTML declares it with ``, and UTF-8 β€” covering nearly all the world's characters β€” is the recommended and default choice. [S1] ## 🧠 핡심 κ°œλ… (Core concepts) - **The charset attribute** β€” proper display requires the browser to know the page's character encoding, declared via ``. [S1] - **UTF-8 is recommended** β€” it covers almost all of the characters and symbols in the world, and is the default character set in HTML5. [S1] - **Historical progression of encodings** β€” ASCII β†’ ANSI (Windows-1252) β†’ ISO-8859-1 (HTML 4 default) β†’ UTF-8 (modern default). Each newer set is largely backward-compatible with ASCII at code points 0–127. [S1] - **Shared low range** β€” across ASCII, ANSI, ISO-8859-1, and UTF-8, code points 0–127 are identical. [S1] ## 🧩 μΆ”μΆœλœ νŒ¨ν„΄ (Extracted patterns) - **HTML5 charset declaration** β€” ``. [S1] - **HTML4 charset declaration** β€” ``. [S1] - **Backward-compatibility pattern** β€” newer encodings keep ASCII (0–127) intact and extend the upper range. [S1] ## πŸ“– μ„ΈλΆ€ λ‚΄μš© (Details) **The HTML charset Attribute** To display an HTML page correctly, a web browser must know the character set used in the page. Developers are encouraged to use UTF-8, which covers almost all of the characters and symbols in the world. The standard declaration is: [S1] ```html ``` **The ASCII Character Set** ASCII was the first character encoding standard for the web. It defined 128 different Latin characters, including English letters (a–z, A–Z), numbers (0–9), and special characters such as `! $ + - ( ) @ < > . # ?`. [S1] **The ANSI Character Set (Windows-1252)** ANSI (Windows-1252) was the first Windows character set. Its layout: [S1] - Characters 0–127 match ASCII. - Characters 128–159 contain special characters. - Characters 160–255 align with UTF-8. HTML5 declaration: [S1] ```html ``` **The ISO-8859-1 Character Set** ISO-8859-1 was the default character set for HTML 4, supporting 256 characters: [S1] - Characters 0–127 are identical to ASCII. - Characters 128–159 are unused. - Characters 160–255 match ANSI and UTF-8. HTML 4 syntax: [S1] ```html ``` HTML 5 syntax: [S1] ```html ``` **The UTF-8 Character Set** UTF-8 character coverage: [S1] - Values 0–127 match ASCII. - Characters 128–159 are unused. - Characters 160–255 align with ANSI / ISO-8859-1. - From value 256 onward, UTF-8 extends to over 10,000 additional characters. **Encoding comparison summary** [S1] | Character set | Range 0–127 | Range 128–159 | Range 160–255 | Beyond 255 | |---|---|---|---|---| | ASCII | Latin characters (128 total) | β€” | β€” | β€” | | ANSI (Windows-1252) | Same as ASCII | Special characters | Same as UTF-8 | β€” | | ISO-8859-1 | Same as ASCII | Unused | Same as ANSI/UTF-8 | β€” | | UTF-8 | Same as ASCII | Unused | Same as ANSI/8859-1 | 10,000+ characters from 256 onward | The page also references many UTF-8 character-set categories, including Basic Latin, Latin Extended A–E, IPA Extensions, Spacing Modifiers, Diacritical Marks, General Punctuation, Super/Subscript, and Braille. [S1] ## πŸ› οΈ 적용 사둀 (Applied in summary) The canonical applied case is the single `` declaration that should appear in the head of essentially every modern HTML page. No external project/commit applications found in the source. ## πŸ’» μ½”λ“œ νŒ¨ν„΄ (Code patterns) Declare UTF-8 in HTML5 (recommended): ```html ``` Legacy HTML4 ISO-8859-1 declaration: ```html ``` ## βš–οΈ 비ꡐ 및 선택 κΈ°μ€€ (Comparison & decision criteria) - **ASCII** β€” only 128 Latin characters; the original web encoding, insufficient for international text. [S1] - **ANSI (Windows-1252)** β€” the first Windows set; adds special characters in 128–159, but Windows-specific. [S1] - **ISO-8859-1** β€” the HTML 4 default; 256 characters, still limited for global content. [S1] - **UTF-8** β€” recommended choice and the HTML5 default; covers almost all of the world's characters and is backward-compatible with ASCII at 0–127. Use UTF-8 unless a legacy constraint forces otherwise. [S1] ## βš–οΈ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & updates) No contradictions found in the source. The historical default shifted from ISO-8859-1 (HTML 4) to UTF-8 (HTML5), reflecting the move toward universal Unicode support. [S1] ## βœ… 검증 μƒνƒœ 및 신뒰도 - **μƒνƒœ:** draft - **검증 단계:** conceptual (μ‹€μ œ 적용 사둀 발견 μ‹œ applied/validated둜 승격 κ°€λŠ₯) - **좜처 신뒰도:** B (W3Schools β€” widely used educational reference, not a primary standards body) - **μ‹ λ’° 점수:** 0.88 - **쀑볡 검사 κ²°κ³Ό:** μ‹ κ·œ 생성 (New discovery) ## πŸ”— 지식 κ·Έλž˜ν”„ (Knowledge Graph) - **μƒμœ„/루트:** [[HTML Tutorial]] - **κ΄€λ ¨ κ°œλ…:** [[HTML Emojis]], [[HTML Symbols]], [[HTML URL Encode]], [[HTML Head]] - **μ°Έμ‘° λ§₯락:** Referenced whenever defining how a page's text is encoded, especially for international characters and emojis. ## πŸ“š 좜처 (Sources) - [S1] W3Schools β€” HTML Charsets β€” https://www.w3schools.com/html/html_charset.asp ## πŸ“ λ³€κ²½ 이λ ₯ (Change history) - 2026-06-23: Initial draft synthesized from the W3Schools "HTML Charsets" page (Astra wiki-curation, P-Reinforce v3.1 format).