d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
203 lines
7.8 KiB
Markdown
203 lines
7.8 KiB
Markdown
---
|
||
id: wiki-2026-0508-research
|
||
title: Research
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Research-Methodology, Literature-Review, Deep-Research]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [research, methodology, literature, ai-aided]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: anthropic-sdk
|
||
---
|
||
|
||
# Research
|
||
|
||
## 매 한 줄
|
||
> **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 Phases
|
||
1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria).
|
||
2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps.
|
||
3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source).
|
||
4. **Synthesis** — themes, gaps, contradictions.
|
||
5. **Hypothesis / contribution** — what novel claim this work adds.
|
||
6. **Validation** — experiment / proof / case study.
|
||
7. **Communication** — paper, blog, talk.
|
||
|
||
### 매 Modern toolchain (2026)
|
||
- **Search**: Semantic Scholar API, Google Scholar, OpenAlex.
|
||
- **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz).
|
||
- **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval).
|
||
- **Notes**: Obsidian + Zotero integration; Logseq; Reflect.
|
||
- **Reproducibility**: Quarto, Jupyter Book, Code Ocean.
|
||
|
||
### 매 AI-aided literature review pattern
|
||
1. Seed papers (3–5 known relevant) → Connected Papers graph.
|
||
2. Snowball (citations both ways) → ~100 candidates.
|
||
3. LLM screen abstracts: relevance score 0–10.
|
||
4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations).
|
||
5. AI cluster into themes; human reviews + writes synthesis.
|
||
|
||
### 매 안전장치 (필수)
|
||
- 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must.
|
||
- 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling.
|
||
- 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt.
|
||
|
||
### 매 응용
|
||
1. PhD literature review.
|
||
2. Industry tech radar / market research.
|
||
3. Due diligence (M&A, investment).
|
||
4. Pre-implementation prior-art search (patents, OSS).
|
||
|
||
## 💻 패턴
|
||
|
||
### Claude deep-research synthesis (verify-first)
|
||
```python
|
||
from anthropic import Anthropic
|
||
import httpx
|
||
|
||
client = Anthropic()
|
||
|
||
def synthesize(question: str, papers: list[dict]) -> str:
|
||
"""papers: [{title, abstract, doi, year}]"""
|
||
corpus = "\n\n".join(
|
||
f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}"
|
||
for i, p in enumerate(papers)
|
||
)
|
||
msg = client.messages.create(
|
||
model="claude-opus-4-7", max_tokens=4096,
|
||
system=("Synthesize evidence. Cite EVERY claim with [index]. "
|
||
"If evidence is weak/contradictory, say so explicitly. "
|
||
"Never fabricate citations."),
|
||
messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}],
|
||
)
|
||
return msg.content[0].text
|
||
|
||
def verify_dois(text: str, papers: list[dict]) -> list[str]:
|
||
"""Hallucination check — every cited DOI must exist in our set."""
|
||
import re
|
||
cited = re.findall(r"doi:(10\.\d+/\S+)", text)
|
||
valid = {p["doi"] for p in papers}
|
||
return [d for d in cited if d not in valid] # offenders
|
||
```
|
||
|
||
### Semantic Scholar fetch
|
||
```python
|
||
def search_s2(query: str, limit: int = 50) -> list[dict]:
|
||
r = httpx.get(
|
||
"https://api.semanticscholar.org/graph/v1/paper/search",
|
||
params={"query": query, "limit": limit,
|
||
"fields": "title,abstract,year,citationCount,externalIds"},
|
||
).json()
|
||
return [{"title": p["title"], "abstract": p.get("abstract") or "",
|
||
"year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"),
|
||
"cites": p["citationCount"]}
|
||
for p in r["data"]]
|
||
```
|
||
|
||
### Snowball expansion
|
||
```python
|
||
def snowball(seed_ids: list[str], depth: int = 2) -> set[str]:
|
||
frontier, seen = set(seed_ids), set(seed_ids)
|
||
for _ in range(depth):
|
||
next_frontier = set()
|
||
for pid in frontier:
|
||
r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references",
|
||
params={"fields": "paperId", "limit": 100}).json()
|
||
next_frontier.update(ref["citedPaper"]["paperId"]
|
||
for ref in r.get("data", [])
|
||
if ref["citedPaper"].get("paperId"))
|
||
frontier = next_frontier - seen
|
||
seen.update(frontier)
|
||
return seen
|
||
```
|
||
|
||
### Structured extraction prompt
|
||
```python
|
||
EXTRACT_PROMPT = """Extract from this paper as JSON:
|
||
{
|
||
"claim": "main thesis in one sentence",
|
||
"method": "how they tested it",
|
||
"evidence": "key result with numbers",
|
||
"n": "sample size",
|
||
"limitations": ["limit1", "limit2"],
|
||
"novelty": "what this adds vs prior work"
|
||
}
|
||
If field unknown, use null. Don't invent."""
|
||
```
|
||
|
||
### Steelman opposite (debias)
|
||
```python
|
||
def steelman(claim: str) -> str:
|
||
return client.messages.create(
|
||
model="claude-opus-4-7", max_tokens=1024,
|
||
messages=[{"role": "user", "content":
|
||
f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, "
|
||
f"citing actual contrary evidence. Be a hostile reviewer."}],
|
||
).content[0].text
|
||
```
|
||
|
||
### Zettelkasten note (atomic)
|
||
```markdown
|
||
---
|
||
id: 2026-05-10-1432
|
||
tags: [retrieval, rag]
|
||
source: [[Lewis-2020-RAG]]
|
||
---
|
||
# Dense retrieval beats BM25 only when query-doc lexical overlap is low
|
||
|
||
In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM)
|
||
but BM25 ≥ DPR on TriviaQA where queries copy doc tokens.
|
||
|
||
→ Hybrid search is robust: pick BM25 for lexical, dense for paraphrase.
|
||
|
||
Connects to: [[Hybrid Search]] · [[BM25]] · [[Dense-Retrieval]]
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research |
|
||
| 매 deep dive (1주) | Manual snowball + AI extraction |
|
||
| Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening |
|
||
| 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts |
|
||
| 매 industry / OSS | GitHub trending + State of X reports + AI synthesis |
|
||
|
||
**기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Scientific Method]]
|
||
- 변형: [[Literature-Review]] · [[Tech-Radar]]
|
||
- Adjacent: [[Hallucination]] · [[RAG]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning.
|
||
**언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이.
|
||
|
||
## ❌ 안티패턴
|
||
- **Cite-without-verify**: AI 의 매 만들어낸 fake DOI.
|
||
- **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시.
|
||
- **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지.
|
||
- **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear.
|
||
- **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing).
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |
|