95cd8bb891
- 코드 그라운딩: 기술 주제 문서의 '적용 사례'에 실제 레포 구현 위치
(file:line)+커밋 자동 주입 (예: 문서 청킹 전략→connectai/src/retrieval/chunker.ts).
멱등 마커(CODE-GROUNDING)로 재실행 시 갱신.
- MOC: 39개 클러스터 폴더에 _MOC.md 학습지도 생성(진입점+통찰 주석).
도구: Datacollect/scripts/{code_grounding,moc_generator}.mjs
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.5 KiB
6.5 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-ontology | Ontology | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
Ontology
매 한 줄
"매 specification of a conceptualization (Gruber 1993) — 매 domain 의 entity, class, relation 의 formal definition". Aristotle 의 categories 에서 출발, Tim Berners-Lee 의 Semantic Web (RDF/OWL) 으로 web-scale 구현. 매 2026 의 사용처: knowledge graph (Wikidata, Google KG), biomedical (Gene Ontology, SNOMED CT), enterprise data fabric, LLM 의 retrieval-augmented generation grounding.
매 핵심
매 핵심 구성요소
- Class (Concept): 매 entity type (e.g., Person, Drug).
- Individual (Instance): 매 구체적 entity (e.g., :alice).
- Property: 매 entity 간 또는 entity-literal 의 binary relation.
- ObjectProperty: 매 entity → entity (e.g., :hasParent).
- DatatypeProperty: 매 entity → literal (e.g., :hasAge xsd:int).
- Axiom: 매 logical statement (subClassOf, equivalentClass, disjointWith).
- Hierarchy: 매 taxonomy (is-a) + partonomy (part-of).
매 stack
- RDF: 매 triple (subject, predicate, object) — graph data model.
- RDFS: 매 lightweight schema (subClassOf, domain, range).
- OWL 2: 매 description logic 기반 — 매 SROIQ(D), reasoning 가능.
- SPARQL: 매 query language (SQL for RDF).
- SHACL: 매 shape-based validation.
매 응용
- Wikidata, DBpedia (general knowledge graph).
- Gene Ontology, SNOMED CT, UMLS (biomedical).
- schema.org (web markup, Google rich results).
- Enterprise: data catalogs (Collibra, Atlan).
- LLM grounding (GraphRAG, knowledge-graph augmented retrieval).
💻 패턴
Turtle (RDF/OWL syntax)
@prefix : <http://example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:Person a owl:Class .
:Drug a owl:Class .
:Antibiotic rdfs:subClassOf :Drug .
:hasPrescribed a owl:ObjectProperty ;
rdfs:domain :Person ;
rdfs:range :Drug .
:alice a :Person ;
:hasPrescribed :amoxicillin .
:amoxicillin a :Antibiotic .
rdflib (Python) — load + query
from rdflib import Graph
g = Graph()
g.parse("ontology.ttl", format="turtle")
# SPARQL: who was prescribed an antibiotic?
q = """
PREFIX : <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?drug WHERE {
?person :hasPrescribed ?drug .
?drug a/rdfs:subClassOf* :Antibiotic .
}
"""
for row in g.query(q):
print(row.person, row.drug)
owlready2 (OWL with reasoning)
from owlready2 import *
onto = get_ontology("http://example.org/onto.owl")
with onto:
class Person(Thing): pass
class Drug(Thing): pass
class Antibiotic(Drug): pass
class hasPrescribed(ObjectProperty):
domain = [Person]
range = [Drug]
alice = Person("alice")
amox = Antibiotic("amoxicillin")
alice.hasPrescribed.append(amox)
sync_reasoner_pellet() # 매 inference: amox is Drug (subclass)
SHACL validation
:PersonShape a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :hasAge ;
sh:datatype xsd:integer ;
sh:minInclusive 0 ;
sh:maxInclusive 150 ;
] .
LLM + ontology RAG (GraphRAG-style)
def graph_rag(question, llm, kg):
# 1. Extract entities from question
entities = llm.extract_entities(question)
# 2. SPARQL: get neighborhood
facts = []
for e in entities:
facts.extend(kg.query(f"""
SELECT ?p ?o WHERE {{ <{e}> ?p ?o }} LIMIT 50
"""))
# 3. Answer with grounded context
return llm.generate(question, context=facts)
Ontology alignment (string + embedding)
from sentence_transformers import SentenceTransformer, util
def align_classes(onto_a_labels, onto_b_labels, threshold=0.85):
model = SentenceTransformer("all-mpnet-base-v2")
emb_a = model.encode(onto_a_labels, convert_to_tensor=True)
emb_b = model.encode(onto_b_labels, convert_to_tensor=True)
sim = util.cos_sim(emb_a, emb_b)
matches = []
for i, row in enumerate(sim):
j = row.argmax().item()
if row[j] > threshold:
matches.append((onto_a_labels[i], onto_b_labels[j], row[j].item()))
return matches
매 결정 기준
| 상황 | Approach |
|---|---|
| simple tagging / faceting | flat taxonomy |
| domain modeling, no reasoning | RDFS |
| reasoning required (subsumption, equivalence) | OWL 2 + reasoner |
| validation rules | SHACL |
| massive scale, low schema | property graph (Neo4j) |
| LLM grounding | knowledge graph + GraphRAG |
기본값: 매 enterprise → SKOS + RDFS; 매 reasoning critical → OWL 2 EL/QL profile.
🔗 Graph
- 부모: Knowledge Graph · Semantic-Web · Knowledge Representation
- 변형: OWL
- 응용: GraphRAG
🤖 LLM 활용
언제: 매 hallucination 감소를 위한 grounding, 매 enterprise data fabric, 매 named-entity resolution against canonical IDs. 언제 X: 매 small unstructured task — overhead 큼. 매 ontology engineering 비용 > 가치.
❌ 안티패턴
- OWL Full 사용: 매 reasoning undecidable. 매 OWL 2 DL profile (EL/QL/RL) 사용.
- subClassOf 의 오용 as instanceOf: 매 class hierarchy ≠ instance membership.
- No URI versioning: 매 schema 진화 시 breakage. 매 owl:versionIRI 사용.
- Free-text label only, no canonical URI: 매 alignment 불가능.
- Reasoning everything every query: 매 비싸다 — materialize 후 cache.
🧪 검증 / 중복
- Verified (Gruber 1993; W3C OWL 2 spec; Foundations of Semantic Web Technologies Hitzler et al.; GraphRAG Microsoft 2024).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Ontology FULL with RDF/OWL/SHACL/GraphRAG patterns |
🛠️ 적용 사례 (Applied in summary)
🔎 코드베이스 근거 (자동 추출 — E:\Wiki 레포)
실제 구현/사용 위치:
connectai/src/features/secondBrainTrace.ts:223— [Omitted long matching line]
자동 생성: code_grounding.mjs · 재실행 시 갱신됨