95cd8bb891
- 코드 그라운딩: 기술 주제 문서의 '적용 사례'에 실제 레포 구현 위치
(file:line)+커밋 자동 주입 (예: 문서 청킹 전략→connectai/src/retrieval/chunker.ts).
멱등 마커(CODE-GROUNDING)로 재실행 시 갱신.
- MOC: 39개 클러스터 폴더에 _MOC.md 학습지도 생성(진입점+통찰 주석).
도구: Datacollect/scripts/{code_grounding,moc_generator}.mjs
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.4 KiB
5.4 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-knowledge-graph | Knowledge Graph | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
Knowledge Graph
매 한 줄
"매 entity-relationship triples 의 graph". Knowledge Graph 는 (head, relation, tail) triple 의 collection 으로 구조화된 knowledge 를 저장하는 graph database paradigm. 2012 Google 의 도입 이후 search/RAG/agents 의 backbone 으로 자리잡았으며, 2026 LLM era 에서는 GraphRAG 와 entity-linking 으로 hallucination mitigation 에 사용.
매 핵심
매 Triple 구조
- (subject, predicate, object) — RDF 표준
- entity ID (e.g. wikidata Q-id) → unique reference
- relation typed (employs, locatedIn, instanceOf, …)
- property graph: edges 도 attributes 보유
매 Schema vs Schema-less
- ontology-driven: OWL, schema.org → strict typing
- LPG (labeled property graph): Neo4j flexible
- emergent KG: LLM 으로 unstructured text 에서 자동 추출
매 응용
- Search ranking (Google KG panels).
- RAG with GraphRAG (Microsoft 2024).
- Agent tool: entity disambiguation.
- Recommendation (LinkedIn Economic Graph).
- Drug discovery (Hetionet).
💻 패턴
NetworkX 로 KG build
import networkx as nx
G = nx.MultiDiGraph()
G.add_edge("Anthropic", "Claude", relation="created")
G.add_edge("Claude", "LLM", relation="instanceOf")
G.add_edge("Anthropic", "San Francisco", relation="locatedIn")
# query: what did Anthropic create?
for _, target, data in G.out_edges("Anthropic", data=True):
if data["relation"] == "created":
print(target) # Claude
LLM 으로 triple 추출
from anthropic import Anthropic
client = Anthropic()
text = "Claude Opus 4.7 was released by Anthropic in 2026."
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=512,
messages=[{
"role": "user",
"content": f"Extract (subject, predicate, object) triples as JSON from: {text}"
}]
)
# → [["Claude Opus 4.7","releasedBy","Anthropic"], ...]
Neo4j Cypher query
// find all 2-hop neighbors of Anthropic
MATCH (a:Org {name:"Anthropic"})-[r1]->(x)-[r2]->(y)
RETURN a, r1, x, r2, y
LIMIT 100;
RDF + SPARQL
from rdflib import Graph, URIRef, Literal
g = Graph()
g.parse("dbpedia.ttl", format="turtle")
q = """
SELECT ?company WHERE {
?company a <http://dbpedia.org/ontology/Company> ;
<http://dbpedia.org/property/foundedYear> "2021"^^xsd:gYear .
}
"""
for row in g.query(q):
print(row.company)
GraphRAG retrieval
def graph_rag_query(q: str, kg, llm):
entities = llm.extract_entities(q)
subgraph = kg.k_hop_subgraph(entities, k=2)
context = subgraph.to_text()
return llm.answer(q, context=context)
Embedding-based KG completion (TransE)
import torch
import torch.nn as nn
class TransE(nn.Module):
def __init__(self, n_ent, n_rel, dim=128):
super().__init__()
self.ent = nn.Embedding(n_ent, dim)
self.rel = nn.Embedding(n_rel, dim)
def score(self, h, r, t):
return -torch.norm(self.ent(h) + self.rel(r) - self.ent(t), p=2, dim=-1)
매 결정 기준
| 상황 | Approach |
|---|---|
| 작은 domain, fast prototype | NetworkX in-memory |
| production, ACID | Neo4j |
| W3C standards | RDF + SPARQL |
| billion-scale, distributed | JanusGraph, TigerGraph |
| LLM RAG | GraphRAG (Microsoft) |
기본값: Neo4j + LLM extraction pipeline.
🔗 Graph
- 부모: Knowledge Graph · Graph_Theory
- 변형: GraphRAG · Ontology
- 응용: Semantic Search · Recommendation-Systems
- Adjacent: Embeddings
🤖 LLM 활용
언제: factual grounding, multi-hop reasoning, entity disambiguation 필요 시. 언제 X: pure semantic similarity 만 필요할 때 — vector DB 가 더 simple.
❌ 안티패턴
- Schema explosion: 매 entity 마다 new relation 정의 → unmanageable.
- Stale KG: 자동 update pipeline 없이 manual curation → 6 months 지나면 obsolete.
- No entity resolution: "Anthropic" vs "anthropic Inc." vs "ANTHROPIC" → duplicate nodes.
- Triple-only thinking: property graph 의 edge attribute 무시.
🧪 검증 / 중복
- Verified (Bollacker 2008 Freebase, Hogan 2021 KG survey ACM CSUR).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — KG fundamentals, triples, GraphRAG, Neo4j patterns |
🛠️ 적용 사례 (Applied in summary)
🔎 코드베이스 근거 (자동 추출 — E:\Wiki 레포)
실제 구현/사용 위치:
connectai/src/features/projectChronicle/guardPrompt.ts:57— [Omitted long matching line]
관련 커밋:
connectai d843364 feat: add premium matrix styling to knowledge graph, glowing nodes, and directional particle flow across synapsesconnectai 279e671 feat: parse real workspace files for knowledge graph topology and add organic organic movement
자동 생성: code_grounding.mjs · 재실행 시 갱신됨