2nd/10_Wiki/Topics/Data-Schema.md

---
id: wiki-2026-0508-data-schema
title: Data Schema
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Schema Design, Data Modeling, Schema Definition]
duplicate_of: none
source_trust_level: A
confidence_score: 0.93
verification_status: applied
tags: [data, schema, database, modeling, validation]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: SQL/JSON/TS
  framework: Postgres / Avro / Zod / JSON Schema
---

# Data Schema

## 매 한 줄
> **"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers"**. 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard.

## 매 핵심

### 매 schema layers
- **Conceptual**: 매 ERD — 매 business entities.
- **Logical**: 매 normalized tables — 매 BCNF/3NF.
- **Physical**: 매 indexes, partitions, storage.
- **API/Wire**: 매 JSON Schema, Avro, Protobuf, GraphQL.
- **Validation**: 매 Zod, Pydantic, Joi (runtime).

### 매 evolution principles
- **Backward compatible**: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능.
- **Forward compatible**: 매 unknown field 의 ignore.
- **Full compatible**: 매 둘 다.
- **SemVer for schemas**: 매 breaking = major bump.

### 매 응용
1. Database schema (Postgres, MySQL, BigQuery).
2. Event streaming (Kafka + Schema Registry).
3. API contracts (OpenAPI, GraphQL, tRPC).
4. Data lake / lakehouse (Iceberg, Delta Lake schema).
5. Form validation (frontend + backend shared via Zod).

## 💻 패턴

### Postgres schema with constraints
```sql
CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'),
  name        TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  status      TEXT NOT NULL CHECK (status IN ('active','suspended','deleted'))
);

CREATE INDEX users_status_idx ON users(status) WHERE status = 'active';
```

### Zod (TypeScript) — runtime + static type
```ts
import { z } from "zod";

export const User = z.object({
  id:     z.string().uuid(),
  email:  z.string().email(),
  name:   z.string().min(1).max(100),
  age:    z.number().int().min(0).max(150).optional(),
  status: z.enum(["active", "suspended", "deleted"]),
});
export type User = z.infer<typeof User>;

const parsed = User.parse(jsonInput);  // throws on invalid
```

### JSON Schema (language-agnostic)
```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["id", "email"],
  "properties": {
    "id":     { "type": "string", "format": "uuid" },
    "email":  { "type": "string", "format": "email" },
    "age":    { "type": "integer", "minimum": 0 }
  },
  "additionalProperties": false
}
```

### Avro schema (Kafka)
```json
{
  "type": "record",
  "name": "UserCreated",
  "namespace": "com.example.events",
  "fields": [
    { "name": "id",    "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "age",   "type": ["null", "int"], "default": null }
  ]
}
```

### Protobuf (gRPC)
```proto
syntax = "proto3";
package user.v1;

message User {
  string id    = 1;
  string email = 2;
  string name  = 3;
  optional int32 age = 4;
  enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; }
  Status status = 5;
}
```

### Pydantic v2 (Python)
```python
from pydantic import BaseModel, EmailStr, Field
from typing import Literal
from uuid import UUID

class User(BaseModel):
    id:     UUID
    email:  EmailStr
    name:   str = Field(min_length=1, max_length=100)
    age:    int | None = Field(default=None, ge=0, le=150)
    status: Literal["active", "suspended", "deleted"]
```

### Migration (Alembic / Drizzle)
```python
# alembic upgrade — backward compatible
def upgrade():
    op.add_column("users",
        sa.Column("phone", sa.String(20), nullable=True))  # nullable = safe
```

### Iceberg schema evolution (lakehouse)
```sql
ALTER TABLE catalog.db.users ADD COLUMN phone STRING;
ALTER TABLE catalog.db.users RENAME COLUMN nm TO name;  -- safe in Iceberg
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| OLTP RDBMS | Strict SQL DDL + migrations |
| Event streaming | Avro + Schema Registry |
| Microservice gRPC | Protobuf |
| Frontend form + backend | Zod (shared) |
| Data lake | Iceberg / Delta with schema evolution |
| Document store | JSON Schema validation in app |

**기본값**: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적).

## 🔗 Graph
- 부모: [[Database Design]] · [[Data Modeling]]
- 변형: [[JSON Schema]] · [[Avro]] · [[Protobuf]]
- 응용: [[API Design]] · [[Event-Driven Architecture]]
- Adjacent: [[Schema Migration]] · [[TypeScript 타입 시스템 (TypeScript Type System)|Type Systems]]

## 🤖 LLM 활용
**언제**: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf).
**언제 X**: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수.

## ❌ 안티패턴
- **Schema-on-read everything**: 매 cost 는 consumer 가 부담 — 매 chaos.
- **Breaking changes without versioning**: 매 consumer outage.
- **Storing JSON blobs in JSON column without structure**: 매 query nightmare.
- **No NOT NULL / no FK / no CHECK**: 매 DB 의 dumb storage 화.
- **Reusing field IDs in protobuf**: 매 wire incompatibility.
- **Adding required field 의 backward compatibility 위반**.

## 🧪 검증 / 중복
- Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — schema layers + evolution + 2026 tooling |