[Wiki] Mass wikification of Datacollector knowledge (P-Reinforce v3.0)

2026-05-05 16:39:56 +09:00
parent dd01e01bea
commit d5da9aeb08
65 changed files with 2619 additions and 94 deletions
@@ -1,64 +1,102 @@
 import os
+import re
+from datetime import datetime
+import uuid

-topics_dir = "/Volumes/Data/project/Antigravity/Wiki/10_Wiki/Topics"
+SOURCE_DIR = "/Volumes/Data/project/Antigravity/Datacollector_MAC/out_wiki"
+TARGET_DIR = "/Volumes/Data/project/Antigravity/Wiki/10_Wiki/Topics"

-# Dictionary of {filename: content}
-# (Content truncated for brevity in this scratch script, will be filled with the synthesized text)
-wikis = {
-    "AGI (Artificial General Intelligence).md": """---
-category: Unified
-tags: [auto-consolidated, technical-documentation]
-title: AGI (Artificial General Intelligence)
-last_updated: 2026-05-05
+def normalize_name(name):
+    # Remove extensions, content in parentheses, and special chars
+    name = re.sub(r'\(.*?\)', '', name)
+    name = name.replace(".md", "").strip()
+    name = re.sub(r'[^a-zA-Z0-9\s]', '', name)
+    return name.lower().replace(" ", "_")
+
+def get_p_reinforce_header(title, tags=None):
+    if tags is None:
+        tags = ["automated", "datacollector", "brain_sync"]
+    
+    tag_str = "[" + ", ".join(tags) + "]"
+    date_str = datetime.now().strftime("%Y-%m-%dT%H:%M:%S.000Z")
+    mission_id = f"mission_{uuid.uuid4().hex[:12]}"
+    
+    header = f"""---
+id: {mission_id}
+date: {date_str}
+type: knowledge_artifact
+standard: P-Reinforce v3.0
+tags: {tag_str}
 ---

-# AGI (Artificial General Intelligence)
+"""
+    return header

-## 📌 Brief Summary
-범용 인공지능(AGI)은 인간이 수행할 수 있는 모든 지적 작업을 수행할 수 있는 인공지능을 의미하며, 인공지능 연구의 궁극적인 목표이다. 특정 분야에 국한되지 않고 새로운 환경에서 학습하고 문제를 해결하며, 상식을 바탕으로 추론하고 자율적으로 행동하는 능력을 포함한다.
+def process_wikification():
+    files = [f for f in os.listdir(SOURCE_DIR) if f.endswith(".md")]
+    groups = {}

-## 📖 Core Content
-* **뉴로-심볼릭 통합 (Neuro-Symbolic Integration):** 신경망의 학습 능력과 기호 논리의 추론 능력을 결합하여 AGI를 구현하려는 시도이다.
-* **자기 개선 및 지속적 학습:** 스스로 알고리즘을 최적화하고 새로운 지식을 지속적으로 갱신하는 능력이 필수적이다.
-* **설명 가능성 및 안전성:** 고도의 지능이 인류의 가치와 정렬(Alignment)되도록 보장하는 거버넌스 체계가 수반되어야 한다.
+    # 1. Grouping
+    for f in files:
+        norm = normalize_name(f)
+        if norm not in groups:
+            groups[norm] = []
+        groups[norm].append(f)

-## ⚖️ Trade-offs & Caveats
-* **지능 vs 통제:** 지능이 높아질수록 인간의 통제를 벗어날 위험(Alignment Problem)이 증가한다.
-* **연산 자원 및 효율성:** AGI 수준의 지능을 구현하기 위한 막대한 하드웨어 비용과 전력 소모가 환경적/경제적 제약으로 작용한다.
+    print(f"Found {len(files)} files, grouped into {len(groups)} themes.")

-## 🔗 Knowledge Connections
-* [[Neuro-Symbolic AI]]
-* [[LLM Alignment]]
+    for norm, filenames in groups.items():
+        # 2. Pick the richest content
+        best_file = max(filenames, key=lambda x: os.path.getsize(os.path.join(SOURCE_DIR, x)))
+        best_path = os.path.join(SOURCE_DIR, best_file)
+        
+        with open(best_path, 'r', encoding='utf-8') as f:
+            content = f.read()

---
-*Last updated: 2026-05-05*""",
-    "Global Workspace Theory (GWT).md": """---
-category: Cognitive Modeling
-tags: [neuroscience, consciousness]
-title: Global Workspace Theory (GWT)
-last_updated: 2026-05-05
---
+        # Clean title (remove [[ ]] if exists)
+        title = best_file.replace(".md", "")
+        clean_title = re.sub(r'\[\[(.*?)\]\]', r'\1', title)
+        
+        # 3. Check for existing file in target (recursive search)
+        existing_path = None
+        for root, dirs, target_files in os.walk(TARGET_DIR):
+            for tf in target_files:
+                if tf.lower() == best_file.lower() or normalize_name(tf) == norm:
+                    existing_path = os.path.join(root, tf)
+                    break
+        
+        # 4. Merge or Create
+        final_content = content
+        if existing_path:
+            with open(existing_path, 'r', encoding='utf-8') as f:
+                existing_content = f.read()
+            
+            # Simple merge: append new content if not already there (rudimentary)
+            if len(content) > len(existing_content):
+                 print(f"Merging and prioritizing NEW content for: {clean_title}")
+                 # Keep existing frontmatter if any, or prepend new
+                 if existing_content.startswith("---"):
+                     parts = existing_content.split("---", 2)
+                     if len(parts) >= 3:
+                         final_content = "---" + parts[1] + "---" + "\n\n" + content
+                 else:
+                     final_content = get_p_reinforce_header(clean_title) + content
+            else:
+                 print(f"Skipping update for {clean_title}, existing content is richer.")
+                 final_content = existing_content
+        else:
+            final_content = get_p_reinforce_header(clean_title) + content

-# Global Workspace Theory (GWT)
+        # 5. Write to target
+        target_path = os.path.join(TARGET_DIR, best_file)
+        # Avoid overwriting if we didn't merge
+        with open(target_path, 'w', encoding='utf-8') as f:
+            f.write(final_content)
+        
+        # 6. Cleanup SOURCE_DIR
+        for f in filenames:
+            os.remove(os.path.join(SOURCE_DIR, f))
+            print(f"Deleted source: {f}")

-## 📌 Brief Summary
-전역 작업 공간 이론(GWT)은 인간의 의식을 '극장의 무대'에 비유하여, 수많은 무의식적 프로세스들이 특정 정보를 전역적으로 공유할 때 의식이 발생한다고 설명하는 인지 아키텍처이다.
-
-## 📖 Core Content
-* **전역 방송 (Global Broadcasting):** 특정 정보가 전역 작업 공간에 진입하면, 뇌의 다른 다양한 모듈들이 해당 정보에 접근하여 병렬적으로 처리할 수 있게 된다.
-* **의식적 주의 (Conscious Attention):** 무의식적인 자극이라도 의식적인 주의가 선행되어야 장기적인 학습 및 암묵적 규칙 추론이 가능하다.
-* **GNW 모델:** 신경생리학적으로 전두엽과 두정엽의 장거리 뉴런들이 이 전역 공간을 형성한다는 이론으로 확장되었다.
-
-## ⚖️ Trade-offs & Caveats
-* **병목 현상:** 전역 공간은 한 번에 한정된 양의 정보만 처리할 수 있어, 복잡한 다중 작업 시 인지적 과부하가 발생한다.
-
---
-*Last updated: 2026-05-05*""",
-    # ... more to be added in chunks or follow-up
-}
-
-for name, content in wikis.items():
-    path = os.path.join(topics_dir, name)
-    with open(path, 'w', encoding='utf-8') as f:
-        f.write(content)
-    print(f"Created: {path}")
+if __name__ == "__main__":
+    process_wikification()