Files
2nd/10_Wiki/Topic_HTML/HTML_Charsets.md
T
koriweb 9609c04755 docs(10_Wiki): W3Schools 위키화 — HTML/CSS/JavaScript(core)
W3Schools 튜토리얼을 P-Reinforce v3.1 포맷으로 위키화(영어 본문, 한/영 섹션 헤더).
- Topic_HTML: 59문서 (튜토리얼+예제, 레퍼런스/메타 제외)
- Topic_CSS: 190문서 (메인 + Advanced/Flexbox/Grid/RWD 전체)
- Topic_JavaScript: 120문서 (코어 언어; Temporal/DOM상세/BOM/WebAPI/AJAX/jQuery/Graphics 등은 후속)
각 폴더 00_INDEX.md(MOC) 포함. 코드 verbatim, 미확인분은 "Not found in source" 표기.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 19:21:18 +09:00

131 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: html-charsets
title: "HTML Charsets"
category: "Frontend"
status: "draft"
verification_status: "conceptual"
canonical_id: ""
aliases: ["HTML character sets", "HTML encoding", "character encoding", "UTF-8", "ASCII", "ANSI", "ISO-8859-1", "charset meta"]
duplicate_of: ""
source_trust_level: "B"
confidence_score: 0.88
created_at: 2026-06-23
updated_at: 2026-06-23
review_reason: ""
merge_history: []
tags: ["html", "web", "frontend", "charset", "encoding", "utf-8", "w3schools"]
raw_sources: ["https://www.w3schools.com/html/html_charset.asp"]
applied_in: []
github_commit: ""
---
# [[HTML Charsets]]
## 🎯 한 줄 통찰 (One-line insight)
To display an HTML page correctly a browser must know which **character set (encoding)** the page uses; modern HTML declares it with `<meta charset="UTF-8">`, and UTF-8 — covering nearly all the world's characters — is the recommended and default choice. [S1]
## 🧠 핵심 개념 (Core concepts)
- **The charset attribute** — proper display requires the browser to know the page's character encoding, declared via `<meta charset="...">`. [S1]
- **UTF-8 is recommended** — it covers almost all of the characters and symbols in the world, and is the default character set in HTML5. [S1]
- **Historical progression of encodings** — ASCII → ANSI (Windows-1252) → ISO-8859-1 (HTML 4 default) → UTF-8 (modern default). Each newer set is largely backward-compatible with ASCII at code points 0127. [S1]
- **Shared low range** — across ASCII, ANSI, ISO-8859-1, and UTF-8, code points 0127 are identical. [S1]
## 🧩 추출된 패턴 (Extracted patterns)
- **HTML5 charset declaration** — `<meta charset="UTF-8">`. [S1]
- **HTML4 charset declaration** — `<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">`. [S1]
- **Backward-compatibility pattern** — newer encodings keep ASCII (0127) intact and extend the upper range. [S1]
## 📖 세부 내용 (Details)
**The HTML charset Attribute**
To display an HTML page correctly, a web browser must know the character set used in the page. Developers are encouraged to use UTF-8, which covers almost all of the characters and symbols in the world. The standard declaration is: [S1]
```html
<meta charset="UTF-8">
```
**The ASCII Character Set**
ASCII was the first character encoding standard for the web. It defined 128 different Latin characters, including English letters (az, AZ), numbers (09), and special characters such as `! $ + - ( ) @ < > . # ?`. [S1]
**The ANSI Character Set (Windows-1252)**
ANSI (Windows-1252) was the first Windows character set. Its layout: [S1]
- Characters 0127 match ASCII.
- Characters 128159 contain special characters.
- Characters 160255 align with UTF-8.
HTML5 declaration: [S1]
```html
<meta charset="Windows-1252">
```
**The ISO-8859-1 Character Set**
ISO-8859-1 was the default character set for HTML 4, supporting 256 characters: [S1]
- Characters 0127 are identical to ASCII.
- Characters 128159 are unused.
- Characters 160255 match ANSI and UTF-8.
HTML 4 syntax: [S1]
```html
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
```
HTML 5 syntax: [S1]
```html
<meta charset="ISO-8859-1">
```
**The UTF-8 Character Set**
UTF-8 character coverage: [S1]
- Values 0127 match ASCII.
- Characters 128159 are unused.
- Characters 160255 align with ANSI / ISO-8859-1.
- From value 256 onward, UTF-8 extends to over 10,000 additional characters.
**Encoding comparison summary** [S1]
| Character set | Range 0127 | Range 128159 | Range 160255 | Beyond 255 |
|---|---|---|---|---|
| ASCII | Latin characters (128 total) | — | — | — |
| ANSI (Windows-1252) | Same as ASCII | Special characters | Same as UTF-8 | — |
| ISO-8859-1 | Same as ASCII | Unused | Same as ANSI/UTF-8 | — |
| UTF-8 | Same as ASCII | Unused | Same as ANSI/8859-1 | 10,000+ characters from 256 onward |
The page also references many UTF-8 character-set categories, including Basic Latin, Latin Extended AE, IPA Extensions, Spacing Modifiers, Diacritical Marks, General Punctuation, Super/Subscript, and Braille. [S1]
## 🛠️ 적용 사례 (Applied in summary)
The canonical applied case is the single `<meta charset="UTF-8">` declaration that should appear in the head of essentially every modern HTML page. No external project/commit applications found in the source.
## 💻 코드 패턴 (Code patterns)
Declare UTF-8 in HTML5 (recommended):
```html
<meta charset="UTF-8">
```
Legacy HTML4 ISO-8859-1 declaration:
```html
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
```
## ⚖️ 비교 및 선택 기준 (Comparison & decision criteria)
- **ASCII** — only 128 Latin characters; the original web encoding, insufficient for international text. [S1]
- **ANSI (Windows-1252)** — the first Windows set; adds special characters in 128159, but Windows-specific. [S1]
- **ISO-8859-1** — the HTML 4 default; 256 characters, still limited for global content. [S1]
- **UTF-8** — recommended choice and the HTML5 default; covers almost all of the world's characters and is backward-compatible with ASCII at 0127. Use UTF-8 unless a legacy constraint forces otherwise. [S1]
## ⚖️ 모순 및 업데이트 (Contradictions & updates)
No contradictions found in the source. The historical default shifted from ISO-8859-1 (HTML 4) to UTF-8 (HTML5), reflecting the move toward universal Unicode support. [S1]
## ✅ 검증 상태 및 신뢰도
- **상태:** draft
- **검증 단계:** conceptual (실제 적용 사례 발견 시 applied/validated로 승격 가능)
- **출처 신뢰도:** B (W3Schools — widely used educational reference, not a primary standards body)
- **신뢰 점수:** 0.88
- **중복 검사 결과:** 신규 생성 (New discovery)
## 🔗 지식 그래프 (Knowledge Graph)
- **상위/루트:** [[HTML Tutorial]]
- **관련 개념:** [[HTML Emojis]], [[HTML Symbols]], [[HTML URL Encode]], [[HTML Head]]
- **참조 맥락:** Referenced whenever defining how a page's text is encoded, especially for international characters and emojis.
## 📚 출처 (Sources)
- [S1] W3Schools — HTML Charsets — https://www.w3schools.com/html/html_charset.asp
## 📝 변경 이력 (Change history)
- 2026-06-23: Initial draft synthesized from the W3Schools "HTML Charsets" page (Astra wiki-curation, P-Reinforce v3.1 format).