[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,65 +2,202 @@
 id: wiki-2026-0508-research
 title: Research
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-RESE-001]
+aliases: [Research-Methodology, Literature-Review, Deep-Research]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.98
-tags: [auto-reinforced, reSearch, truth-seeking, investigation, knowledge-expansion, Analysis]
+confidence_score: 0.9
+verification_status: applied
+tags: [research, methodology, literature, ai-aided]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: python
+  framework: anthropic-sdk
 ---

-# [[Research|Research]]
+# Research

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "미지로의 항해: 이미 알려진 지식을 넘어 세상의 새로운 질서나 진실을 찾아 떠나는 지적 모험이자, '왜'라는 질문을 가지고 정보를 분해(Analysis)하고 재조합하여 지능의 영토를 넓히는 원동력."
+## 매 한 줄
+> **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축.

-## 📖 구조화된 지식 (Synthesized Content)
-연구(Research)는 지식을 넓히거나 새로운 사실 및 원리를 발견하기 위한 체계적인 탐구 활동입니다. (본 시스템 구축의 본질)
+## 매 핵심

-1.  **연구의 3대 가치**:
-    *   **Discovery**: 새로운 현상이나 법칙의 발견. ([[Innovation|Innovation]]와 연결)
-    *   **[[Refinement|Refinement]]**: 기존 지식의 오류 수정 및 고도화. (Refinement와 연결)
-    *   **Application**: 발견된 원리를 현실의 문제를 푸는 도구로 변환. ([[Solution|Solution]])
-2.  **왜 중요한가?**:
-    *   연구가 멈춘 지능은 고인 물처럼 썩게 되며, 변화하는 세상에 적응하지 못하고 사멸하기 때문임. (RL Update가 필요한 이유)
+### 매 Phases
+1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria).
+2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps.
+3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source).
+4. **Synthesis** — themes, gaps, contradictions.
+5. **Hypothesis / contribution** — what novel claim this work adds.
+6. **Validation** — experiment / proof / case study.
+7. **Communication** — paper, blog, talk.

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 소수의 엘리트만이 수행하는 '특수한 정책'이었으나, 현대 정책은 누구나 AI 비서를 통해 전문가 수준의 조사를 수행할 수 있는 '리서치의 민주화 정책' 시대에 진입함(RL Update).
- **정책 변화(RL Update)**: 단순히 정보를 모으는 리서치 정책을 넘어, AI가 스스로 가설을 세우고 실험 코드를 짠 뒤 결과를 분석해 논문을 작성하는 '자율 연구 에이전트 정책'이 차세대 AI의 패러다임 정책임.
+### 매 Modern toolchain (2026)
+- **Search**: Semantic Scholar API, Google Scholar, OpenAlex.
+- **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz).
+- **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval).
+- **Notes**: Obsidian + Zotero integration; Logseq; Reflect.
+- **Reproducibility**: Quarto, Jupyter Book, Code Ocean.

-## 🔗 지식 연결 (Graph)
- [[Innovation|Innovation]], [[Refinement|Refinement]], [[Analysis|Analysis]], [[Scientific-Method|Scientific-Method]], Evidence-Based-Thinking, [[P-Reinforce|P-Reinforce]]
- **Modern Tech/Tools**: Scholar search engines, AI Research Agents, Digital archives.
+### 매 AI-aided literature review pattern
+1. Seed papers (3–5 known relevant) → Connected Papers graph.
+2. Snowball (citations both ways) → ~100 candidates.
+3. LLM screen abstracts: relevance score 0–10.
+4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations).
+5. AI cluster into themes; human reviews + writes synthesis.
+
+### 매 안전장치 (필수)
+- 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must.
+- 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling.
+- 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt.
+
+### 매 응용
+1. PhD literature review.
+2. Industry tech radar / market research.
+3. Due diligence (M&A, investment).
+4. Pre-implementation prior-art search (patents, OSS).
+
+## 💻 패턴
+
+### Claude deep-research synthesis (verify-first)
+```python
+from anthropic import Anthropic
+import httpx
+
+client = Anthropic()
+
+def synthesize(question: str, papers: list[dict]) -> str:
+    """papers: [{title, abstract, doi, year}]"""
+    corpus = "\n\n".join(
+        f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}"
+        for i, p in enumerate(papers)
+    )
+    msg = client.messages.create(
+        model="claude-opus-4-7", max_tokens=4096,
+        system=("Synthesize evidence. Cite EVERY claim with [index]. "
+                "If evidence is weak/contradictory, say so explicitly. "
+                "Never fabricate citations."),
+        messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}],
+    )
+    return msg.content[0].text
+
+def verify_dois(text: str, papers: list[dict]) -> list[str]:
+    """Hallucination check — every cited DOI must exist in our set."""
+    import re
+    cited = re.findall(r"doi:(10\.\d+/\S+)", text)
+    valid = {p["doi"] for p in papers}
+    return [d for d in cited if d not in valid]  # offenders
+```
+
+### Semantic Scholar fetch
+```python
+def search_s2(query: str, limit: int = 50) -> list[dict]:
+    r = httpx.get(
+        "https://api.semanticscholar.org/graph/v1/paper/search",
+        params={"query": query, "limit": limit,
+                "fields": "title,abstract,year,citationCount,externalIds"},
+    ).json()
+    return [{"title": p["title"], "abstract": p.get("abstract") or "",
+             "year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"),
+             "cites": p["citationCount"]}
+            for p in r["data"]]
+```
+
+### Snowball expansion
+```python
+def snowball(seed_ids: list[str], depth: int = 2) -> set[str]:
+    frontier, seen = set(seed_ids), set(seed_ids)
+    for _ in range(depth):
+        next_frontier = set()
+        for pid in frontier:
+            r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references",
+                          params={"fields": "paperId", "limit": 100}).json()
+            next_frontier.update(ref["citedPaper"]["paperId"]
+                                 for ref in r.get("data", [])
+                                 if ref["citedPaper"].get("paperId"))
+        frontier = next_frontier - seen
+        seen.update(frontier)
+    return seen
+```
+
+### Structured extraction prompt
+```python
+EXTRACT_PROMPT = """Extract from this paper as JSON:
+{
+  "claim": "main thesis in one sentence",
+  "method": "how they tested it",
+  "evidence": "key result with numbers",
+  "n": "sample size",
+  "limitations": ["limit1", "limit2"],
+  "novelty": "what this adds vs prior work"
+}
+If field unknown, use null. Don't invent."""
+```
+
+### Steelman opposite (debias)
+```python
+def steelman(claim: str) -> str:
+    return client.messages.create(
+        model="claude-opus-4-7", max_tokens=1024,
+        messages=[{"role": "user", "content":
+            f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, "
+            f"citing actual contrary evidence. Be a hostile reviewer."}],
+    ).content[0].text
+```
+
+### Zettelkasten note (atomic)
+```markdown
 ---
+id: 2026-05-10-1432
+tags: [retrieval, rag]
+source: [[Lewis-2020-RAG]]
+---
+# Dense retrieval beats BM25 only when query-doc lexical overlap is low

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM)
+but BM25 ≥ DPR on TriviaQA where queries copy doc tokens.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+→ Hybrid search is robust: pick BM25 for lexical, dense for paraphrase.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+Connects to: [[Hybrid-Search]] · [[BM25]] · [[Dense-Retrieval]]
+```

-## 🧪 검증 상태 (Validation)
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research |
+| 매 deep dive (1주) | Manual snowball + AI extraction |
+| Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening |
+| 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts |
+| 매 industry / OSS | GitHub trending + State of X reports + AI synthesis |

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+**기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman.

-## 🧬 중복 검사 (Duplicate Check)
+## 🔗 Graph
+- 부모: [[Knowledge-Work]] · [[Scientific-Method]]
+- 변형: [[Systematic-Review]] · [[Literature-Review]] · [[Tech-Radar]]
+- 응용: [[Hypothesis-Driven-Development]] · [[Due-Diligence]]
+- Adjacent: [[Zettelkasten]] · [[Citation-Graph]] · [[Hallucination]] · [[RAG]]

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+## 🤖 LLM 활용
+**언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning.
+**언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이.

-## 🕓 변경 이력 (Changelog)
+## ❌ 안티패턴
+- **Cite-without-verify**: AI 의 매 만들어낸 fake DOI.
+- **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시.
+- **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지.
+- **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear.
+- **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing).

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+## 🧪 검증 / 중복
+- Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |