[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,121 +1,162 @@
 ---
 id: wiki-2026-0508-knowledge-graph
 title: Knowledge Graph
-category: Computer_Science_and_Theory
-status: needs_review
+category: 10_Wiki/Topics
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-KGR-001]
+aliases: [KG, Semantic Graph, Entity Graph]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [auto-reinforced, knowledge-graph, ontology, semantic-web, entity-relationship, graph-database]
+confidence_score: 0.9
+verification_status: applied
+tags: [knowledge-graph, graph, semantic, retrieval]
 raw_sources: []
-last_reinforced: 2026-05-04
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: python
+  framework: networkx, neo4j, rdflib
 ---

-# [[Knowledge Graph|Knowledge Graph]]
+# Knowledge Graph

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "데이터를 넘어선 지식의 망: 분산된 정보들 사이의 관계를 인간의 뇌처럼 연결하여, 단순한 키워드 검색이 아닌 복합적인 인과관계와 맥락을 컴퓨터가 이해하고 추론할 수 있게 하는 시맨틱 데이터 구조."
+## 매 한 줄
+> **"매 entity-relationship triples 의 graph"**. Knowledge Graph 는 (head, relation, tail) triple 의 collection 으로 구조화된 knowledge 를 저장하는 graph database paradigm. 2012 Google 의 도입 이후 search/RAG/agents 의 backbone 으로 자리잡았으며, 2026 LLM era 에서는 GraphRAG 와 entity-linking 으로 hallucination mitigation 에 사용.

-## 📖 구조화된 지식 (Synthesized Content)
-지식 그래프(Knowledge Graph)는 엔티티(인물, 사물, 장소, 개념 등)와 그들 간의 관계를 그래프 구조로 표현한 거대한 지식 기반 시스템입니다.
+## 매 핵심

-1.  **핵심 구성 요소**:
-    *   **노드 (Node / Entity)**: 실제 세계의 객체나 개념을 나타냅니다.
-    *   **엣지 (Edge / Relationship)**: 노드 간의 관계를 나타냅니다 (예: 'A는 B의 제작자이다').
-    *   **속성 (Property)**: 노드나 엣지에 대한 추가적인 세부 정보.
+### 매 Triple 구조
+- (subject, predicate, object) — RDF 표준
+- entity ID (e.g. wikidata Q-id) → unique reference
+- relation typed (employs, locatedIn, instanceOf, …)
+- property graph: edges 도 attributes 보유

-2.  **왜 지식 그래프인가?**:
-    *   **시맨틱 상호운용성**: 서로 다른 출처의 데이터를 의미적으로 통합할 수 있습니다.
-    *   **지능적 추론**: "A를 만든 사람이 살고 있는 도시의 인구는?"과 같은 다단계 질문에 대해 관계를 추적하여 답변할 수 있습니다.
-    *   **[[GraphRAG|GraphRAG]]**: 텍스트 데이터를 그래프로 변환하여 LLM의 검색 정확도와 문맥 파악 능력을 비약적으로 향상시킵니다.
+### 매 Schema vs Schema-less
+- ontology-driven: OWL, schema.org → strict typing
+- LPG (labeled property graph): Neo4j flexible
+- emergent KG: LLM 으로 unstructured text 에서 자동 추출

-3.  **지식의 고도화 도구**:
-    *   **[[Ontology|Ontology]]**: 지식 그래프의 설계도 역할을 하며, 어떤 엔티티와 관계가 존재할 수 있는지 규정합니다.
-    *   **Graph Database**: Neo4j, FalkorDB 등 그래프 구조를 저장하고 쿼리하는 전용 DB입니다.
+### 매 응용
+1. Search ranking (Google KG panels).
+2. RAG with GraphRAG (Microsoft 2024).
+3. Agent tool: entity disambiguation.
+4. Recommendation (LinkedIn Economic Graph).
+5. Drug discovery (Hetionet).

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
-*   **구축 및 유지보수의 난해함**: 비정형 데이터에서 정확한 엔티티와 관계를 추출하는 과정이 복잡하며 전문적인 지식이 필요합니다.
-*   **확장성 문제 (Scalability)**: 그래프가 거대해질수록 관계를 탐색하는 쿼리 비용이 급격히 증가할 수 있습니다.
-*   **데이터 정제**: 잘못된 관계 정보가 유입될 경우 전체 지식 체계의 신뢰도가 훼손되므로 엄격한 거버넌스가 필요합니다.
+## 💻 패턴

-## 💻 실전 구현 코드 (Boilerplate)
-`Neo4j` 스타일의 Cypher 쿼리를 사용하여 지식 그래프를 생성하고 조회하는 기초 예시입니다.
+### NetworkX 로 KG build
+```python
+import networkx as nx

+G = nx.MultiDiGraph()
+G.add_edge("Anthropic", "Claude", relation="created")
+G.add_edge("Claude", "LLM", relation="instanceOf")
+G.add_edge("Anthropic", "San Francisco", relation="locatedIn")
+
+# query: what did Anthropic create?
+for _, target, data in G.out_edges("Anthropic", data=True):
+    if data["relation"] == "created":
+        print(target)  # Claude
+```
+
+### LLM 으로 triple 추출
+```python
+from anthropic import Anthropic
+
+client = Anthropic()
+text = "Claude Opus 4.7 was released by Anthropic in 2026."
+
+resp = client.messages.create(
+    model="claude-opus-4-7",
+    max_tokens=512,
+    messages=[{
+        "role": "user",
+        "content": f"Extract (subject, predicate, object) triples as JSON from: {text}"
+    }]
+)
+# → [["Claude Opus 4.7","releasedBy","Anthropic"], ...]
+```
+
+### Neo4j Cypher query
 ```cypher
-// 1. 엔티티 및 관계 생성 (P-Reinforce 관련 예시)
-CREATE (p:Project {name: "Antigravity"})
-CREATE (e:Engine {name: "ConnectAI"})
-CREATE (s:Standard {name: "P-Reinforce v3.0"})
-
-CREATE (p)-[:USES]->(e)
-CREATE (e)-[:FOLLOWS]->(s)
-
-// 2. 다단계 추론 쿼리
-// "Antigravity 프로젝트가 사용하는 엔진이 따르는 표준은 무엇인가?"
-MATCH (p:Project {name: "Antigravity"})-[:USES]->(e)-[:FOLLOWS]->(s)
-RETURN s.name AS StandardName
+// find all 2-hop neighbors of Anthropic
+MATCH (a:Org {name:"Anthropic"})-[r1]->(x)-[r2]->(y)
+RETURN a, r1, x, r2, y
+LIMIT 100;
 ```

-## 🔗 지식 연결 (Graph)
-*   **기반 개념**: [[Computer Science and Theory|Computer Science]], [[Ontology|Ontology]]
-*   **활용 기술**: [[GraphRAG|GraphRAG]], [[Semantic Search|Semantic Search]]
-*   **보관 기술**: [[Graph Database|Graph Database]], [[Vector Database|Vector Database (Hybrid)]]
-
---
-*Last updated: 2026-05-04*
-
-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
-
-**언제 이 지식을 쓰는가:**
- *(TODO)*
-
-**언제 쓰면 안 되는가:**
- *(TODO)*
-
-## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
-
-## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+### RDF + SPARQL
+```python
+from rdflib import Graph, URIRef, Literal
+g = Graph()
+g.parse("dbpedia.ttl", format="turtle")
+q = """
+SELECT ?company WHERE {
+    ?company a <http://dbpedia.org/ontology/Company> ;
+             <http://dbpedia.org/property/foundedYear> "2021"^^xsd:gYear .
+}
+"""
+for row in g.query(q):
+    print(row.company)
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### GraphRAG retrieval
+```python
+def graph_rag_query(q: str, kg, llm):
+    entities = llm.extract_entities(q)
+    subgraph = kg.k_hop_subgraph(entities, k=2)
+    context = subgraph.to_text()
+    return llm.answer(q, context=context)
+```

-**선택 A를 써야 할 때:**
- *(TODO)*
+### Embedding-based KG completion (TransE)
+```python
+import torch
+import torch.nn as nn

-**선택 B를 써야 할 때:**
- *(TODO)*
+class TransE(nn.Module):
+    def __init__(self, n_ent, n_rel, dim=128):
+        super().__init__()
+        self.ent = nn.Embedding(n_ent, dim)
+        self.rel = nn.Embedding(n_rel, dim)
+    def score(self, h, r, t):
+        return -torch.norm(self.ent(h) + self.rel(r) - self.ent(t), p=2, dim=-1)
+```

-**기본값:**
-> *(TODO)*
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| 작은 domain, fast prototype | NetworkX in-memory |
+| production, ACID | Neo4j |
+| W3C standards | RDF + SPARQL |
+| billion-scale, distributed | JanusGraph, TigerGraph |
+| LLM RAG | GraphRAG (Microsoft) |

-## ❌ 안티패턴 (Anti-Patterns)
+**기본값**: Neo4j + LLM extraction pipeline.

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+## 🔗 Graph
+- 부모: [[Knowledge-Graph-Foundations]] · [[Graph-Theory]]
+- 변형: [[GraphRAG]] · [[Entity-Linking]] · [[Ontology]]
+- 응용: [[RAG-Architecture]] · [[Semantic-Search]] · [[Recommendation-Systems]]
+- Adjacent: [[Vector-Database]] · [[Embeddings]]
+
+## 🤖 LLM 활용
+**언제**: factual grounding, multi-hop reasoning, entity disambiguation 필요 시.
+**언제 X**: pure semantic similarity 만 필요할 때 — vector DB 가 더 simple.
+
+## ❌ 안티패턴
+- **Schema explosion**: 매 entity 마다 new relation 정의 → unmanageable.
+- **Stale KG**: 자동 update pipeline 없이 manual curation → 6 months 지나면 obsolete.
+- **No entity resolution**: "Anthropic" vs "anthropic Inc." vs "ANTHROPIC" → duplicate nodes.
+- **Triple-only thinking**: property graph 의 edge attribute 무시.
+
+## 🧪 검증 / 중복
+- Verified (Bollacker 2008 Freebase, Hogan 2021 KG survey ACM CSUR).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — KG fundamentals, triples, GraphRAG, Neo4j patterns |