Files
2nd/10_Wiki/Topics/Data-Schema.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

190 lines
5.9 KiB
Markdown

---
id: wiki-2026-0508-data-schema
title: Data Schema
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Schema Design, Data Modeling, Schema Definition]
duplicate_of: none
source_trust_level: A
confidence_score: 0.93
verification_status: applied
tags: [data, schema, database, modeling, validation]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: SQL/JSON/TS
framework: Postgres / Avro / Zod / JSON Schema
---
# Data Schema
## 매 한 줄
> **"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers"**. 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard.
## 매 핵심
### 매 schema layers
- **Conceptual**: 매 ERD — 매 business entities.
- **Logical**: 매 normalized tables — 매 BCNF/3NF.
- **Physical**: 매 indexes, partitions, storage.
- **API/Wire**: 매 JSON Schema, Avro, Protobuf, GraphQL.
- **Validation**: 매 Zod, Pydantic, Joi (runtime).
### 매 evolution principles
- **Backward compatible**: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능.
- **Forward compatible**: 매 unknown field 의 ignore.
- **Full compatible**: 매 둘 다.
- **SemVer for schemas**: 매 breaking = major bump.
### 매 응용
1. Database schema (Postgres, MySQL, BigQuery).
2. Event streaming (Kafka + Schema Registry).
3. API contracts (OpenAPI, GraphQL, tRPC).
4. Data lake / lakehouse (Iceberg, Delta Lake schema).
5. Form validation (frontend + backend shared via Zod).
## 💻 패턴
### Postgres schema with constraints
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
status TEXT NOT NULL CHECK (status IN ('active','suspended','deleted'))
);
CREATE INDEX users_status_idx ON users(status) WHERE status = 'active';
```
### Zod (TypeScript) — runtime + static type
```ts
import { z } from "zod";
export const User = z.object({
id: z.string().uuid(),
email: z.string().email(),
name: z.string().min(1).max(100),
age: z.number().int().min(0).max(150).optional(),
status: z.enum(["active", "suspended", "deleted"]),
});
export type User = z.infer<typeof User>;
const parsed = User.parse(jsonInput); // throws on invalid
```
### JSON Schema (language-agnostic)
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["id", "email"],
"properties": {
"id": { "type": "string", "format": "uuid" },
"email": { "type": "string", "format": "email" },
"age": { "type": "integer", "minimum": 0 }
},
"additionalProperties": false
}
```
### Avro schema (Kafka)
```json
{
"type": "record",
"name": "UserCreated",
"namespace": "com.example.events",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "age", "type": ["null", "int"], "default": null }
]
}
```
### Protobuf (gRPC)
```proto
syntax = "proto3";
package user.v1;
message User {
string id = 1;
string email = 2;
string name = 3;
optional int32 age = 4;
enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; }
Status status = 5;
}
```
### Pydantic v2 (Python)
```python
from pydantic import BaseModel, EmailStr, Field
from typing import Literal
from uuid import UUID
class User(BaseModel):
id: UUID
email: EmailStr
name: str = Field(min_length=1, max_length=100)
age: int | None = Field(default=None, ge=0, le=150)
status: Literal["active", "suspended", "deleted"]
```
### Migration (Alembic / Drizzle)
```python
# alembic upgrade — backward compatible
def upgrade():
op.add_column("users",
sa.Column("phone", sa.String(20), nullable=True)) # nullable = safe
```
### Iceberg schema evolution (lakehouse)
```sql
ALTER TABLE catalog.db.users ADD COLUMN phone STRING;
ALTER TABLE catalog.db.users RENAME COLUMN nm TO name; -- safe in Iceberg
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| OLTP RDBMS | Strict SQL DDL + migrations |
| Event streaming | Avro + Schema Registry |
| Microservice gRPC | Protobuf |
| Frontend form + backend | Zod (shared) |
| Data lake | Iceberg / Delta with schema evolution |
| Document store | JSON Schema validation in app |
**기본값**: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적).
## 🔗 Graph
- 부모: [[Database Design]] · [[Data Modeling]]
- 변형: [[JSON Schema]] · [[Avro]] · [[Protobuf]]
- 응용: [[API Design]] · [[Event-Driven Architecture]]
- Adjacent: [[Schema Migration]] · [[TypeScript 타입 시스템 (TypeScript Type System)|Type Systems]]
## 🤖 LLM 활용
**언제**: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf).
**언제 X**: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수.
## ❌ 안티패턴
- **Schema-on-read everything**: 매 cost 는 consumer 가 부담 — 매 chaos.
- **Breaking changes without versioning**: 매 consumer outage.
- **Storing JSON blobs in JSON column without structure**: 매 query nightmare.
- **No NOT NULL / no FK / no CHECK**: 매 DB 의 dumb storage 화.
- **Reusing field IDs in protobuf**: 매 wire incompatibility.
- **Adding required field 의 backward compatibility 위반**.
## 🧪 검증 / 중복
- Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — schema layers + evolution + 2026 tooling |