f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
190 lines
5.9 KiB
Markdown
190 lines
5.9 KiB
Markdown
---
|
|
id: wiki-2026-0508-data-schema
|
|
title: Data Schema
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Schema Design, Data Modeling, Schema Definition]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.93
|
|
verification_status: applied
|
|
tags: [data, schema, database, modeling, validation]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: SQL/JSON/TS
|
|
framework: Postgres / Avro / Zod / JSON Schema
|
|
---
|
|
|
|
# Data Schema
|
|
|
|
## 매 한 줄
|
|
> **"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers"**. 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 schema layers
|
|
- **Conceptual**: 매 ERD — 매 business entities.
|
|
- **Logical**: 매 normalized tables — 매 BCNF/3NF.
|
|
- **Physical**: 매 indexes, partitions, storage.
|
|
- **API/Wire**: 매 JSON Schema, Avro, Protobuf, GraphQL.
|
|
- **Validation**: 매 Zod, Pydantic, Joi (runtime).
|
|
|
|
### 매 evolution principles
|
|
- **Backward compatible**: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능.
|
|
- **Forward compatible**: 매 unknown field 의 ignore.
|
|
- **Full compatible**: 매 둘 다.
|
|
- **SemVer for schemas**: 매 breaking = major bump.
|
|
|
|
### 매 응용
|
|
1. Database schema (Postgres, MySQL, BigQuery).
|
|
2. Event streaming (Kafka + Schema Registry).
|
|
3. API contracts (OpenAPI, GraphQL, tRPC).
|
|
4. Data lake / lakehouse (Iceberg, Delta Lake schema).
|
|
5. Form validation (frontend + backend shared via Zod).
|
|
|
|
## 💻 패턴
|
|
|
|
### Postgres schema with constraints
|
|
```sql
|
|
CREATE TABLE users (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
email TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'),
|
|
name TEXT NOT NULL,
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
|
status TEXT NOT NULL CHECK (status IN ('active','suspended','deleted'))
|
|
);
|
|
|
|
CREATE INDEX users_status_idx ON users(status) WHERE status = 'active';
|
|
```
|
|
|
|
### Zod (TypeScript) — runtime + static type
|
|
```ts
|
|
import { z } from "zod";
|
|
|
|
export const User = z.object({
|
|
id: z.string().uuid(),
|
|
email: z.string().email(),
|
|
name: z.string().min(1).max(100),
|
|
age: z.number().int().min(0).max(150).optional(),
|
|
status: z.enum(["active", "suspended", "deleted"]),
|
|
});
|
|
export type User = z.infer<typeof User>;
|
|
|
|
const parsed = User.parse(jsonInput); // throws on invalid
|
|
```
|
|
|
|
### JSON Schema (language-agnostic)
|
|
```json
|
|
{
|
|
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
"type": "object",
|
|
"required": ["id", "email"],
|
|
"properties": {
|
|
"id": { "type": "string", "format": "uuid" },
|
|
"email": { "type": "string", "format": "email" },
|
|
"age": { "type": "integer", "minimum": 0 }
|
|
},
|
|
"additionalProperties": false
|
|
}
|
|
```
|
|
|
|
### Avro schema (Kafka)
|
|
```json
|
|
{
|
|
"type": "record",
|
|
"name": "UserCreated",
|
|
"namespace": "com.example.events",
|
|
"fields": [
|
|
{ "name": "id", "type": "string" },
|
|
{ "name": "email", "type": "string" },
|
|
{ "name": "age", "type": ["null", "int"], "default": null }
|
|
]
|
|
}
|
|
```
|
|
|
|
### Protobuf (gRPC)
|
|
```proto
|
|
syntax = "proto3";
|
|
package user.v1;
|
|
|
|
message User {
|
|
string id = 1;
|
|
string email = 2;
|
|
string name = 3;
|
|
optional int32 age = 4;
|
|
enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; }
|
|
Status status = 5;
|
|
}
|
|
```
|
|
|
|
### Pydantic v2 (Python)
|
|
```python
|
|
from pydantic import BaseModel, EmailStr, Field
|
|
from typing import Literal
|
|
from uuid import UUID
|
|
|
|
class User(BaseModel):
|
|
id: UUID
|
|
email: EmailStr
|
|
name: str = Field(min_length=1, max_length=100)
|
|
age: int | None = Field(default=None, ge=0, le=150)
|
|
status: Literal["active", "suspended", "deleted"]
|
|
```
|
|
|
|
### Migration (Alembic / Drizzle)
|
|
```python
|
|
# alembic upgrade — backward compatible
|
|
def upgrade():
|
|
op.add_column("users",
|
|
sa.Column("phone", sa.String(20), nullable=True)) # nullable = safe
|
|
```
|
|
|
|
### Iceberg schema evolution (lakehouse)
|
|
```sql
|
|
ALTER TABLE catalog.db.users ADD COLUMN phone STRING;
|
|
ALTER TABLE catalog.db.users RENAME COLUMN nm TO name; -- safe in Iceberg
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| OLTP RDBMS | Strict SQL DDL + migrations |
|
|
| Event streaming | Avro + Schema Registry |
|
|
| Microservice gRPC | Protobuf |
|
|
| Frontend form + backend | Zod (shared) |
|
|
| Data lake | Iceberg / Delta with schema evolution |
|
|
| Document store | JSON Schema validation in app |
|
|
|
|
**기본값**: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적).
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Database Design]] · [[Data Modeling]]
|
|
- 변형: [[JSON Schema]] · [[Avro]] · [[Protobuf]]
|
|
- 응용: [[API Design]] · [[Event-Driven Architecture]]
|
|
- Adjacent: [[Schema Migration]] · [[TypeScript 타입 시스템 (TypeScript Type System)|Type Systems]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf).
|
|
**언제 X**: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수.
|
|
|
|
## ❌ 안티패턴
|
|
- **Schema-on-read everything**: 매 cost 는 consumer 가 부담 — 매 chaos.
|
|
- **Breaking changes without versioning**: 매 consumer outage.
|
|
- **Storing JSON blobs in JSON column without structure**: 매 query nightmare.
|
|
- **No NOT NULL / no FK / no CHECK**: 매 DB 의 dumb storage 화.
|
|
- **Reusing field IDs in protobuf**: 매 wire incompatibility.
|
|
- **Adding required field 의 backward compatibility 위반**.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — schema layers + evolution + 2026 tooling |
|