Files
2nd/10_Wiki/Topics/AI_and_ML/Research.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

203 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-research
title: Research
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Research-Methodology, Literature-Review, Deep-Research]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [research, methodology, literature, ai-aided]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: anthropic-sdk
---
# Research
## 매 한 줄
> **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축.
## 매 핵심
### 매 Phases
1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria).
2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps.
3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source).
4. **Synthesis** — themes, gaps, contradictions.
5. **Hypothesis / contribution** — what novel claim this work adds.
6. **Validation** — experiment / proof / case study.
7. **Communication** — paper, blog, talk.
### 매 Modern toolchain (2026)
- **Search**: Semantic Scholar API, Google Scholar, OpenAlex.
- **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz).
- **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval).
- **Notes**: Obsidian + Zotero integration; Logseq; Reflect.
- **Reproducibility**: Quarto, Jupyter Book, Code Ocean.
### 매 AI-aided literature review pattern
1. Seed papers (35 known relevant) → Connected Papers graph.
2. Snowball (citations both ways) → ~100 candidates.
3. LLM screen abstracts: relevance score 010.
4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations).
5. AI cluster into themes; human reviews + writes synthesis.
### 매 안전장치 (필수)
- 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must.
- 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling.
- 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt.
### 매 응용
1. PhD literature review.
2. Industry tech radar / market research.
3. Due diligence (M&A, investment).
4. Pre-implementation prior-art search (patents, OSS).
## 💻 패턴
### Claude deep-research synthesis (verify-first)
```python
from anthropic import Anthropic
import httpx
client = Anthropic()
def synthesize(question: str, papers: list[dict]) -> str:
"""papers: [{title, abstract, doi, year}]"""
corpus = "\n\n".join(
f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}"
for i, p in enumerate(papers)
)
msg = client.messages.create(
model="claude-opus-4-7", max_tokens=4096,
system=("Synthesize evidence. Cite EVERY claim with [index]. "
"If evidence is weak/contradictory, say so explicitly. "
"Never fabricate citations."),
messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}],
)
return msg.content[0].text
def verify_dois(text: str, papers: list[dict]) -> list[str]:
"""Hallucination check — every cited DOI must exist in our set."""
import re
cited = re.findall(r"doi:(10\.\d+/\S+)", text)
valid = {p["doi"] for p in papers}
return [d for d in cited if d not in valid] # offenders
```
### Semantic Scholar fetch
```python
def search_s2(query: str, limit: int = 50) -> list[dict]:
r = httpx.get(
"https://api.semanticscholar.org/graph/v1/paper/search",
params={"query": query, "limit": limit,
"fields": "title,abstract,year,citationCount,externalIds"},
).json()
return [{"title": p["title"], "abstract": p.get("abstract") or "",
"year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"),
"cites": p["citationCount"]}
for p in r["data"]]
```
### Snowball expansion
```python
def snowball(seed_ids: list[str], depth: int = 2) -> set[str]:
frontier, seen = set(seed_ids), set(seed_ids)
for _ in range(depth):
next_frontier = set()
for pid in frontier:
r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references",
params={"fields": "paperId", "limit": 100}).json()
next_frontier.update(ref["citedPaper"]["paperId"]
for ref in r.get("data", [])
if ref["citedPaper"].get("paperId"))
frontier = next_frontier - seen
seen.update(frontier)
return seen
```
### Structured extraction prompt
```python
EXTRACT_PROMPT = """Extract from this paper as JSON:
{
"claim": "main thesis in one sentence",
"method": "how they tested it",
"evidence": "key result with numbers",
"n": "sample size",
"limitations": ["limit1", "limit2"],
"novelty": "what this adds vs prior work"
}
If field unknown, use null. Don't invent."""
```
### Steelman opposite (debias)
```python
def steelman(claim: str) -> str:
return client.messages.create(
model="claude-opus-4-7", max_tokens=1024,
messages=[{"role": "user", "content":
f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, "
f"citing actual contrary evidence. Be a hostile reviewer."}],
).content[0].text
```
### Zettelkasten note (atomic)
```markdown
---
id: 2026-05-10-1432
tags: [retrieval, rag]
source: [[Lewis-2020-RAG]]
---
# Dense retrieval beats BM25 only when query-doc lexical overlap is low
In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM)
but BM25 ≥ DPR on TriviaQA where queries copy doc tokens.
→ Hybrid search is robust: pick BM25 for lexical, dense for paraphrase.
Connects to: [[Hybrid-Search]] · [[BM25]] · [[Dense-Retrieval]]
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research |
| 매 deep dive (1주) | Manual snowball + AI extraction |
| Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening |
| 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts |
| 매 industry / OSS | GitHub trending + State of X reports + AI synthesis |
**기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman.
## 🔗 Graph
- 부모: [[Scientific-Method]]
- 변형: [[Literature-Review]] · [[Tech-Radar]]
- Adjacent: [[Hallucination]] · [[RAG]]
## 🤖 LLM 활용
**언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning.
**언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이.
## ❌ 안티패턴
- **Cite-without-verify**: AI 의 매 만들어낸 fake DOI.
- **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시.
- **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지.
- **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear.
- **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing).
## 🧪 검증 / 중복
- Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |