Files
2nd/10_Wiki/Topics/Data-Schema.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.9 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-data-schema Data Schema 10_Wiki/Topics verified self
Schema Design
Data Modeling
Schema Definition
none A 0.93 applied
data
schema
database
modeling
validation
2026-05-10 pending
language framework
SQL/JSON/TS Postgres / Avro / Zod / JSON Schema

Data Schema

매 한 줄

"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers". 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard.

매 핵심

매 schema layers

  • Conceptual: 매 ERD — 매 business entities.
  • Logical: 매 normalized tables — 매 BCNF/3NF.
  • Physical: 매 indexes, partitions, storage.
  • API/Wire: 매 JSON Schema, Avro, Protobuf, GraphQL.
  • Validation: 매 Zod, Pydantic, Joi (runtime).

매 evolution principles

  • Backward compatible: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능.
  • Forward compatible: 매 unknown field 의 ignore.
  • Full compatible: 매 둘 다.
  • SemVer for schemas: 매 breaking = major bump.

매 응용

  1. Database schema (Postgres, MySQL, BigQuery).
  2. Event streaming (Kafka + Schema Registry).
  3. API contracts (OpenAPI, GraphQL, tRPC).
  4. Data lake / lakehouse (Iceberg, Delta Lake schema).
  5. Form validation (frontend + backend shared via Zod).

💻 패턴

Postgres schema with constraints

CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'),
  name        TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  status      TEXT NOT NULL CHECK (status IN ('active','suspended','deleted'))
);

CREATE INDEX users_status_idx ON users(status) WHERE status = 'active';

Zod (TypeScript) — runtime + static type

import { z } from "zod";

export const User = z.object({
  id:     z.string().uuid(),
  email:  z.string().email(),
  name:   z.string().min(1).max(100),
  age:    z.number().int().min(0).max(150).optional(),
  status: z.enum(["active", "suspended", "deleted"]),
});
export type User = z.infer<typeof User>;

const parsed = User.parse(jsonInput);  // throws on invalid

JSON Schema (language-agnostic)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["id", "email"],
  "properties": {
    "id":     { "type": "string", "format": "uuid" },
    "email":  { "type": "string", "format": "email" },
    "age":    { "type": "integer", "minimum": 0 }
  },
  "additionalProperties": false
}

Avro schema (Kafka)

{
  "type": "record",
  "name": "UserCreated",
  "namespace": "com.example.events",
  "fields": [
    { "name": "id",    "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "age",   "type": ["null", "int"], "default": null }
  ]
}

Protobuf (gRPC)

syntax = "proto3";
package user.v1;

message User {
  string id    = 1;
  string email = 2;
  string name  = 3;
  optional int32 age = 4;
  enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; }
  Status status = 5;
}

Pydantic v2 (Python)

from pydantic import BaseModel, EmailStr, Field
from typing import Literal
from uuid import UUID

class User(BaseModel):
    id:     UUID
    email:  EmailStr
    name:   str = Field(min_length=1, max_length=100)
    age:    int | None = Field(default=None, ge=0, le=150)
    status: Literal["active", "suspended", "deleted"]

Migration (Alembic / Drizzle)

# alembic upgrade — backward compatible
def upgrade():
    op.add_column("users",
        sa.Column("phone", sa.String(20), nullable=True))  # nullable = safe

Iceberg schema evolution (lakehouse)

ALTER TABLE catalog.db.users ADD COLUMN phone STRING;
ALTER TABLE catalog.db.users RENAME COLUMN nm TO name;  -- safe in Iceberg

매 결정 기준

상황 Approach
OLTP RDBMS Strict SQL DDL + migrations
Event streaming Avro + Schema Registry
Microservice gRPC Protobuf
Frontend form + backend Zod (shared)
Data lake Iceberg / Delta with schema evolution
Document store JSON Schema validation in app

기본값: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적).

🔗 Graph

🤖 LLM 활용

언제: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf). 언제 X: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수.

안티패턴

  • Schema-on-read everything: 매 cost 는 consumer 가 부담 — 매 chaos.
  • Breaking changes without versioning: 매 consumer outage.
  • Storing JSON blobs in JSON column without structure: 매 query nightmare.
  • No NOT NULL / no FK / no CHECK: 매 DB 의 dumb storage 화.
  • Reusing field IDs in protobuf: 매 wire incompatibility.
  • Adding required field 의 backward compatibility 위반.

🧪 검증 / 중복

  • Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — schema layers + evolution + 2026 tooling