Files
2nd/10_Wiki/Topics/Computer_Science_and_Theory/Knowledge Graph.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

4.8 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-knowledge-graph Knowledge Graph 10_Wiki/Topics verified self
KG
Semantic Graph
Entity Graph
none A 0.9 applied
knowledge-graph
graph
semantic
retrieval
2026-05-10 pending
language framework
python networkx, neo4j, rdflib

Knowledge Graph

매 한 줄

"매 entity-relationship triples 의 graph". Knowledge Graph 는 (head, relation, tail) triple 의 collection 으로 구조화된 knowledge 를 저장하는 graph database paradigm. 2012 Google 의 도입 이후 search/RAG/agents 의 backbone 으로 자리잡았으며, 2026 LLM era 에서는 GraphRAG 와 entity-linking 으로 hallucination mitigation 에 사용.

매 핵심

매 Triple 구조

  • (subject, predicate, object) — RDF 표준
  • entity ID (e.g. wikidata Q-id) → unique reference
  • relation typed (employs, locatedIn, instanceOf, …)
  • property graph: edges 도 attributes 보유

매 Schema vs Schema-less

  • ontology-driven: OWL, schema.org → strict typing
  • LPG (labeled property graph): Neo4j flexible
  • emergent KG: LLM 으로 unstructured text 에서 자동 추출

매 응용

  1. Search ranking (Google KG panels).
  2. RAG with GraphRAG (Microsoft 2024).
  3. Agent tool: entity disambiguation.
  4. Recommendation (LinkedIn Economic Graph).
  5. Drug discovery (Hetionet).

💻 패턴

NetworkX 로 KG build

import networkx as nx

G = nx.MultiDiGraph()
G.add_edge("Anthropic", "Claude", relation="created")
G.add_edge("Claude", "LLM", relation="instanceOf")
G.add_edge("Anthropic", "San Francisco", relation="locatedIn")

# query: what did Anthropic create?
for _, target, data in G.out_edges("Anthropic", data=True):
    if data["relation"] == "created":
        print(target)  # Claude

LLM 으로 triple 추출

from anthropic import Anthropic

client = Anthropic()
text = "Claude Opus 4.7 was released by Anthropic in 2026."

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": f"Extract (subject, predicate, object) triples as JSON from: {text}"
    }]
)
# → [["Claude Opus 4.7","releasedBy","Anthropic"], ...]

Neo4j Cypher query

// find all 2-hop neighbors of Anthropic
MATCH (a:Org {name:"Anthropic"})-[r1]->(x)-[r2]->(y)
RETURN a, r1, x, r2, y
LIMIT 100;

RDF + SPARQL

from rdflib import Graph, URIRef, Literal
g = Graph()
g.parse("dbpedia.ttl", format="turtle")
q = """
SELECT ?company WHERE {
    ?company a <http://dbpedia.org/ontology/Company> ;
             <http://dbpedia.org/property/foundedYear> "2021"^^xsd:gYear .
}
"""
for row in g.query(q):
    print(row.company)

GraphRAG retrieval

def graph_rag_query(q: str, kg, llm):
    entities = llm.extract_entities(q)
    subgraph = kg.k_hop_subgraph(entities, k=2)
    context = subgraph.to_text()
    return llm.answer(q, context=context)

Embedding-based KG completion (TransE)

import torch
import torch.nn as nn

class TransE(nn.Module):
    def __init__(self, n_ent, n_rel, dim=128):
        super().__init__()
        self.ent = nn.Embedding(n_ent, dim)
        self.rel = nn.Embedding(n_rel, dim)
    def score(self, h, r, t):
        return -torch.norm(self.ent(h) + self.rel(r) - self.ent(t), p=2, dim=-1)

매 결정 기준

상황 Approach
작은 domain, fast prototype NetworkX in-memory
production, ACID Neo4j
W3C standards RDF + SPARQL
billion-scale, distributed JanusGraph, TigerGraph
LLM RAG GraphRAG (Microsoft)

기본값: Neo4j + LLM extraction pipeline.

🔗 Graph

🤖 LLM 활용

언제: factual grounding, multi-hop reasoning, entity disambiguation 필요 시. 언제 X: pure semantic similarity 만 필요할 때 — vector DB 가 더 simple.

안티패턴

  • Schema explosion: 매 entity 마다 new relation 정의 → unmanageable.
  • Stale KG: 자동 update pipeline 없이 manual curation → 6 months 지나면 obsolete.
  • No entity resolution: "Anthropic" vs "anthropic Inc." vs "ANTHROPIC" → duplicate nodes.
  • Triple-only thinking: property graph 의 edge attribute 무시.

🧪 검증 / 중복

  • Verified (Bollacker 2008 Freebase, Hogan 2021 KG survey ACM CSUR).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — KG fundamentals, triples, GraphRAG, Neo4j patterns