--- id: wiki-2026-0508-data-schema title: Data Schema category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Schema Design, Data Modeling, Schema Definition] duplicate_of: none source_trust_level: A confidence_score: 0.93 verification_status: applied tags: [data, schema, database, modeling, validation] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: SQL/JSON/TS framework: Postgres / Avro / Zod / JSON Schema --- # Data Schema ## 매 한 줄 > **"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers"**. 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard. ## 매 핵심 ### 매 schema layers - **Conceptual**: 매 ERD — 매 business entities. - **Logical**: 매 normalized tables — 매 BCNF/3NF. - **Physical**: 매 indexes, partitions, storage. - **API/Wire**: 매 JSON Schema, Avro, Protobuf, GraphQL. - **Validation**: 매 Zod, Pydantic, Joi (runtime). ### 매 evolution principles - **Backward compatible**: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능. - **Forward compatible**: 매 unknown field 의 ignore. - **Full compatible**: 매 둘 다. - **SemVer for schemas**: 매 breaking = major bump. ### 매 응용 1. Database schema (Postgres, MySQL, BigQuery). 2. Event streaming (Kafka + Schema Registry). 3. API contracts (OpenAPI, GraphQL, tRPC). 4. Data lake / lakehouse (Iceberg, Delta Lake schema). 5. Form validation (frontend + backend shared via Zod). ## 💻 패턴 ### Postgres schema with constraints ```sql CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'), name TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), status TEXT NOT NULL CHECK (status IN ('active','suspended','deleted')) ); CREATE INDEX users_status_idx ON users(status) WHERE status = 'active'; ``` ### Zod (TypeScript) — runtime + static type ```ts import { z } from "zod"; export const User = z.object({ id: z.string().uuid(), email: z.string().email(), name: z.string().min(1).max(100), age: z.number().int().min(0).max(150).optional(), status: z.enum(["active", "suspended", "deleted"]), }); export type User = z.infer; const parsed = User.parse(jsonInput); // throws on invalid ``` ### JSON Schema (language-agnostic) ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "required": ["id", "email"], "properties": { "id": { "type": "string", "format": "uuid" }, "email": { "type": "string", "format": "email" }, "age": { "type": "integer", "minimum": 0 } }, "additionalProperties": false } ``` ### Avro schema (Kafka) ```json { "type": "record", "name": "UserCreated", "namespace": "com.example.events", "fields": [ { "name": "id", "type": "string" }, { "name": "email", "type": "string" }, { "name": "age", "type": ["null", "int"], "default": null } ] } ``` ### Protobuf (gRPC) ```proto syntax = "proto3"; package user.v1; message User { string id = 1; string email = 2; string name = 3; optional int32 age = 4; enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; } Status status = 5; } ``` ### Pydantic v2 (Python) ```python from pydantic import BaseModel, EmailStr, Field from typing import Literal from uuid import UUID class User(BaseModel): id: UUID email: EmailStr name: str = Field(min_length=1, max_length=100) age: int | None = Field(default=None, ge=0, le=150) status: Literal["active", "suspended", "deleted"] ``` ### Migration (Alembic / Drizzle) ```python # alembic upgrade — backward compatible def upgrade(): op.add_column("users", sa.Column("phone", sa.String(20), nullable=True)) # nullable = safe ``` ### Iceberg schema evolution (lakehouse) ```sql ALTER TABLE catalog.db.users ADD COLUMN phone STRING; ALTER TABLE catalog.db.users RENAME COLUMN nm TO name; -- safe in Iceberg ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | OLTP RDBMS | Strict SQL DDL + migrations | | Event streaming | Avro + Schema Registry | | Microservice gRPC | Protobuf | | Frontend form + backend | Zod (shared) | | Data lake | Iceberg / Delta with schema evolution | | Document store | JSON Schema validation in app | **기본값**: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적). ## 🔗 Graph - 부모: [[Database Design]] · [[Data Modeling]] - 변형: [[JSON Schema]] · [[Avro]] · [[Protobuf]] - 응용: [[API Design]] · [[Event-Driven Architecture]] - Adjacent: [[Schema Migration]] · [[TypeScript 타입 시스템 (TypeScript Type System)|Type Systems]] ## 🤖 LLM 활용 **언제**: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf). **언제 X**: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수. ## ❌ 안티패턴 - **Schema-on-read everything**: 매 cost 는 consumer 가 부담 — 매 chaos. - **Breaking changes without versioning**: 매 consumer outage. - **Storing JSON blobs in JSON column without structure**: 매 query nightmare. - **No NOT NULL / no FK / no CHECK**: 매 DB 의 dumb storage 화. - **Reusing field IDs in protobuf**: 매 wire incompatibility. - **Adding required field 의 backward compatibility 위반**. ## 🧪 검증 / 중복 - Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — schema layers + evolution + 2026 tooling |