---
id: wiki-2026-0508-knowledge-graph
title: Knowledge Graph
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [KG, Semantic Graph, Entity Graph]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [knowledge-graph, graph, semantic, retrieval]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: networkx, neo4j, rdflib
---

# Knowledge Graph

## 매 한 줄
> **"매 entity-relationship triples 의 graph"**. Knowledge Graph 는 (head, relation, tail) triple 의 collection 으로 구조화된 knowledge 를 저장하는 graph database paradigm. 2012 Google 의 도입 이후 search/RAG/agents 의 backbone 으로 자리잡았으며, 2026 LLM era 에서는 GraphRAG 와 entity-linking 으로 hallucination mitigation 에 사용.

## 매 핵심

### 매 Triple 구조
- (subject, predicate, object) — RDF 표준
- entity ID (e.g. wikidata Q-id) → unique reference
- relation typed (employs, locatedIn, instanceOf, …)
- property graph: edges 도 attributes 보유

### 매 Schema vs Schema-less
- ontology-driven: OWL, schema.org → strict typing
- LPG (labeled property graph): Neo4j flexible
- emergent KG: LLM 으로 unstructured text 에서 자동 추출

### 매 응용
1. Search ranking (Google KG panels).
2. RAG with GraphRAG (Microsoft 2024).
3. Agent tool: entity disambiguation.
4. Recommendation (LinkedIn Economic Graph).
5. Drug discovery (Hetionet).

## 💻 패턴

### NetworkX 로 KG build
```python
import networkx as nx

G = nx.MultiDiGraph()
G.add_edge("Anthropic", "Claude", relation="created")
G.add_edge("Claude", "LLM", relation="instanceOf")
G.add_edge("Anthropic", "San Francisco", relation="locatedIn")

# query: what did Anthropic create?
for _, target, data in G.out_edges("Anthropic", data=True):
    if data["relation"] == "created":
        print(target)  # Claude
```

### LLM 으로 triple 추출
```python
from anthropic import Anthropic

client = Anthropic()
text = "Claude Opus 4.7 was released by Anthropic in 2026."

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": f"Extract (subject, predicate, object) triples as JSON from: {text}"
    }]
)
# → [["Claude Opus 4.7","releasedBy","Anthropic"], ...]
```

### Neo4j Cypher query
```cypher
// find all 2-hop neighbors of Anthropic
MATCH (a:Org {name:"Anthropic"})-[r1]->(x)-[r2]->(y)
RETURN a, r1, x, r2, y
LIMIT 100;
```

### RDF + SPARQL
```python
from rdflib import Graph, URIRef, Literal
g = Graph()
g.parse("dbpedia.ttl", format="turtle")
q = """
SELECT ?company WHERE {
    ?company a <http://dbpedia.org/ontology/Company> ;
             <http://dbpedia.org/property/foundedYear> "2021"^^xsd:gYear .
}
"""
for row in g.query(q):
    print(row.company)
```

### GraphRAG retrieval
```python
def graph_rag_query(q: str, kg, llm):
    entities = llm.extract_entities(q)
    subgraph = kg.k_hop_subgraph(entities, k=2)
    context = subgraph.to_text()
    return llm.answer(q, context=context)
```

### Embedding-based KG completion (TransE)
```python
import torch
import torch.nn as nn

class TransE(nn.Module):
    def __init__(self, n_ent, n_rel, dim=128):
        super().__init__()
        self.ent = nn.Embedding(n_ent, dim)
        self.rel = nn.Embedding(n_rel, dim)
    def score(self, h, r, t):
        return -torch.norm(self.ent(h) + self.rel(r) - self.ent(t), p=2, dim=-1)
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| 작은 domain, fast prototype | NetworkX in-memory |
| production, ACID | Neo4j |
| W3C standards | RDF + SPARQL |
| billion-scale, distributed | JanusGraph, TigerGraph |
| LLM RAG | GraphRAG (Microsoft) |

**기본값**: Neo4j + LLM extraction pipeline.

## 🔗 Graph
- 부모: [[Knowledge-Graph-Foundations]] · [[Graph-Theory]]
- 변형: [[GraphRAG]] · [[Entity-Linking]] · [[Ontology]]
- 응용: [[RAG-Architecture]] · [[Semantic-Search]] · [[Recommendation-Systems]]
- Adjacent: [[Vector-Database]] · [[Embeddings]]

## 🤖 LLM 활용
**언제**: factual grounding, multi-hop reasoning, entity disambiguation 필요 시.
**언제 X**: pure semantic similarity 만 필요할 때 — vector DB 가 더 simple.

## ❌ 안티패턴
- **Schema explosion**: 매 entity 마다 new relation 정의 → unmanageable.
- **Stale KG**: 자동 update pipeline 없이 manual curation → 6 months 지나면 obsolete.
- **No entity resolution**: "Anthropic" vs "anthropic Inc." vs "ANTHROPIC" → duplicate nodes.
- **Triple-only thinking**: property graph 의 edge attribute 무시.

## 🧪 검증 / 중복
- Verified (Bollacker 2008 Freebase, Hogan 2021 KG survey ACM CSUR).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — KG fundamentals, triples, GraphRAG, Neo4j patterns |