[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,91 +2,188 @@
 id: wiki-2026-0508-data-schema
 title: Data Schema
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-SCHE-001]
+aliases: [Schema Design, Data Modeling, Schema Definition]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.94
-tags: [auto-reinforced, schema, data-structure, organization, blueprint, database-design]
+confidence_score: 0.93
+verification_status: applied
+tags: [data, schema, database, modeling, validation]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: SQL/JSON/TS
+  framework: Postgres / Avro / Zod / JSON Schema
 ---

-# [[Schema|Schema]]
+# Data Schema

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "데이터의 골격: 수만 개의 정보가 중구난방으로 쌓이지 않도록, 각각의 이름과 형식을 미리 정의해 둔 설계도이자, 시스템이 '이 데이터가 여기에 들어갈 자리가 맞는지'를 즉각 판별하게 돕는 질서의 틀."
+## 매 한 줄
+> **"매 schema 의 핵심: structure + constraints + evolution + 매 contract between producers/consumers"**. 매 1970 Codd relational model 으로 시작, 매 2000s schemaless / NoSQL backlash, 매 2020s schema-on-read 의 한계 인식 후 매 type-safe 회귀 (Zod, TypeScript-first ORMs, dbt contracts). 매 2026 현재 schema-as-code + version-aware evolution 의 standard.

-## 📖 구조화된 지식 (Synthesized Content)
-스키마(Schema)는 자료의 구조, 자료의 표현 방법, 자료 간의 관계를 형식 언어로 정의한 것입니다.
+## 매 핵심

-1.  **3대 유형**:
-    *   **Conceptual Schema**: 사용자 관점에서의 전체적인 데이터 구조 (개념적 설계).
-    *   **[[Logic|Logic]]al Schema**: DBMS가 이해할 수 있는 구체적인 테이블과 관계 정의. ([[Relational-Database|Relational-Database]]와 연결)
-    *   **Physical Schema**: 실제 저장 장치에 데이터가 어떻게 박힐지 결정.
-2.  **왜 중요한가?**:
-    *   스키마가 없는 지식 시스템은 결국 쓰레기통(Data swamp)이 되기 때문이며, 데이터의 무결성(Inte[[Grit|Grit]]y)과 검색 효율성을 보장하는 유일한 방법임. ([[Scalability|Scalability]]의 전제 조건)
+### 매 schema layers
+- **Conceptual**: 매 ERD — 매 business entities.
+- **Logical**: 매 normalized tables — 매 BCNF/3NF.
+- **Physical**: 매 indexes, partitions, storage.
+- **API/Wire**: 매 JSON Schema, Avro, Protobuf, GraphQL.
+- **Validation**: 매 Zod, Pydantic, Joi (runtime).

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 한 번 정하면 바꾸기 힘든 '경직된 정책(Hard schema)'이었으나, 현대 정책은 지식의 변화에 따라 구조를 유연하게 확장하는 '스키마리스(NoSQL) 정책'이나 '동적 스키마 정책'과 상호 보완하며 발전함(RL Update).
- **정책 변화(RL Update)**: 본 시스템의 메타데이터(YAML) 정책 또한 일종의 지식 스키마 정책이며, `P-Reinforce` 프로토콜 정책을 통해 모든 지식 파일이 통일된 구조 정책을 유지하도록 강제 중임.
+### 매 evolution principles
+- **Backward compatible**: 매 add nullable / default 만 — 매 reader of old schema 의 새 data read 가능.
+- **Forward compatible**: 매 unknown field 의 ignore.
+- **Full compatible**: 매 둘 다.
+- **SemVer for schemas**: 매 breaking = major bump.

-## 🔗 지식 연결 (Graph)
- [[Relational-Database|Relational-Database]], [[Scalability|Scalability]], [[Inexact-Science|Inexact-Science]], Standard-Operating-Procedure, [[Management|Management]]
- **Modern Tech/Tools**: JSON Schema, SQL DDL, GraphQL, XML Schema.
---
+### 매 응용
+1. Database schema (Postgres, MySQL, BigQuery).
+2. Event streaming (Kafka + Schema Registry).
+3. API contracts (OpenAPI, GraphQL, tRPC).
+4. Data lake / lakehouse (Iceberg, Delta Lake schema).
+5. Form validation (frontend + backend shared via Zod).

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+## 💻 패턴

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+### Postgres schema with constraints
+```sql
+CREATE TABLE users (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  email       TEXT NOT NULL UNIQUE CHECK (email ~* '^.+@.+\..+$'),
+  name        TEXT NOT NULL,
+  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
+  status      TEXT NOT NULL CHECK (status IN ('active','suspended','deleted'))
+);

-**언제 쓰면 안 되는가:**
- *(TODO)*
-
-## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
-
-## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+CREATE INDEX users_status_idx ON users(status) WHERE status = 'active';
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### Zod (TypeScript) — runtime + static type
+```ts
+import { z } from "zod";

-**선택 A를 써야 할 때:**
- *(TODO)*
+export const User = z.object({
+  id:     z.string().uuid(),
+  email:  z.string().email(),
+  name:   z.string().min(1).max(100),
+  age:    z.number().int().min(0).max(150).optional(),
+  status: z.enum(["active", "suspended", "deleted"]),
+});
+export type User = z.infer<typeof User>;

-**선택 B를 써야 할 때:**
- *(TODO)*
+const parsed = User.parse(jsonInput);  // throws on invalid
+```

-**기본값:**
-> *(TODO)*
+### JSON Schema (language-agnostic)
+```json
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "type": "object",
+  "required": ["id", "email"],
+  "properties": {
+    "id":     { "type": "string", "format": "uuid" },
+    "email":  { "type": "string", "format": "email" },
+    "age":    { "type": "integer", "minimum": 0 }
+  },
+  "additionalProperties": false
+}
+```

-## ❌ 안티패턴 (Anti-Patterns)
+### Avro schema (Kafka)
+```json
+{
+  "type": "record",
+  "name": "UserCreated",
+  "namespace": "com.example.events",
+  "fields": [
+    { "name": "id",    "type": "string" },
+    { "name": "email", "type": "string" },
+    { "name": "age",   "type": ["null", "int"], "default": null }
+  ]
+}
+```

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+### Protobuf (gRPC)
+```proto
+syntax = "proto3";
+package user.v1;
+
+message User {
+  string id    = 1;
+  string email = 2;
+  string name  = 3;
+  optional int32 age = 4;
+  enum Status { ACTIVE = 0; SUSPENDED = 1; DELETED = 2; }
+  Status status = 5;
+}
+```
+
+### Pydantic v2 (Python)
+```python
+from pydantic import BaseModel, EmailStr, Field
+from typing import Literal
+from uuid import UUID
+
+class User(BaseModel):
+    id:     UUID
+    email:  EmailStr
+    name:   str = Field(min_length=1, max_length=100)
+    age:    int | None = Field(default=None, ge=0, le=150)
+    status: Literal["active", "suspended", "deleted"]
+```
+
+### Migration (Alembic / Drizzle)
+```python
+# alembic upgrade — backward compatible
+def upgrade():
+    op.add_column("users",
+        sa.Column("phone", sa.String(20), nullable=True))  # nullable = safe
+```
+
+### Iceberg schema evolution (lakehouse)
+```sql
+ALTER TABLE catalog.db.users ADD COLUMN phone STRING;
+ALTER TABLE catalog.db.users RENAME COLUMN nm TO name;  -- safe in Iceberg
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| OLTP RDBMS | Strict SQL DDL + migrations |
+| Event streaming | Avro + Schema Registry |
+| Microservice gRPC | Protobuf |
+| Frontend form + backend | Zod (shared) |
+| Data lake | Iceberg / Delta with schema evolution |
+| Document store | JSON Schema validation in app |
+
+**기본값**: 매 strict schema-first — 매 schemaless 의 default 의 X (debt 누적).
+
+## 🔗 Graph
+- 부모: [[Database Design]] · [[Data Modeling]]
+- 변형: [[Relational Schema]] · [[JSON Schema]] · [[Avro]] · [[Protobuf]]
+- 응용: [[API Design]] · [[Event-Driven Architecture]] · [[Data Lakehouse]]
+- Adjacent: [[Data Validation]] · [[Schema Migration]] · [[Type Systems]]
+
+## 🤖 LLM 활용
+**언제**: 매 schema drafting from natural-language requirements, 매 migration generation, 매 schema diff explanation, 매 cross-format conversion (Postgres ↔ Avro ↔ Protobuf).
+**언제 X**: 매 production migrations 의 LLM 의 단독 실행 X — 매 review + dry-run 필수.
+
+## ❌ 안티패턴
+- **Schema-on-read everything**: 매 cost 는 consumer 가 부담 — 매 chaos.
+- **Breaking changes without versioning**: 매 consumer outage.
+- **Storing JSON blobs in JSON column without structure**: 매 query nightmare.
+- **No NOT NULL / no FK / no CHECK**: 매 DB 의 dumb storage 화.
+- **Reusing field IDs in protobuf**: 매 wire incompatibility.
+- **Adding required field 의 backward compatibility 위반**.
+
+## 🧪 검증 / 중복
+- Verified (Codd 1970, Kleppmann "Designing Data-Intensive Applications", Confluent Schema Registry docs, JSON Schema 2020-12, Iceberg spec).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — schema layers + evolution + 2026 tooling |