Files
2nd/10_Wiki/Topics/AI_and_ML/Research.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

203 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-research
title: Research
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Research-Methodology, Literature-Review, Deep-Research]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [research, methodology, literature, ai-aided]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: anthropic-sdk
---
# Research
## 매 한 줄
> **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축.
## 매 핵심
### 매 Phases
1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria).
2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps.
3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source).
4. **Synthesis** — themes, gaps, contradictions.
5. **Hypothesis / contribution** — what novel claim this work adds.
6. **Validation** — experiment / proof / case study.
7. **Communication** — paper, blog, talk.
### 매 Modern toolchain (2026)
- **Search**: Semantic Scholar API, Google Scholar, OpenAlex.
- **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz).
- **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval).
- **Notes**: Obsidian + Zotero integration; Logseq; Reflect.
- **Reproducibility**: Quarto, Jupyter Book, Code Ocean.
### 매 AI-aided literature review pattern
1. Seed papers (35 known relevant) → Connected Papers graph.
2. Snowball (citations both ways) → ~100 candidates.
3. LLM screen abstracts: relevance score 010.
4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations).
5. AI cluster into themes; human reviews + writes synthesis.
### 매 안전장치 (필수)
- 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must.
- 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling.
- 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt.
### 매 응용
1. PhD literature review.
2. Industry tech radar / market research.
3. Due diligence (M&A, investment).
4. Pre-implementation prior-art search (patents, OSS).
## 💻 패턴
### Claude deep-research synthesis (verify-first)
```python
from anthropic import Anthropic
import httpx
client = Anthropic()
def synthesize(question: str, papers: list[dict]) -> str:
"""papers: [{title, abstract, doi, year}]"""
corpus = "\n\n".join(
f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}"
for i, p in enumerate(papers)
)
msg = client.messages.create(
model="claude-opus-4-7", max_tokens=4096,
system=("Synthesize evidence. Cite EVERY claim with [index]. "
"If evidence is weak/contradictory, say so explicitly. "
"Never fabricate citations."),
messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}],
)
return msg.content[0].text
def verify_dois(text: str, papers: list[dict]) -> list[str]:
"""Hallucination check — every cited DOI must exist in our set."""
import re
cited = re.findall(r"doi:(10\.\d+/\S+)", text)
valid = {p["doi"] for p in papers}
return [d for d in cited if d not in valid] # offenders
```
### Semantic Scholar fetch
```python
def search_s2(query: str, limit: int = 50) -> list[dict]:
r = httpx.get(
"https://api.semanticscholar.org/graph/v1/paper/search",
params={"query": query, "limit": limit,
"fields": "title,abstract,year,citationCount,externalIds"},
).json()
return [{"title": p["title"], "abstract": p.get("abstract") or "",
"year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"),
"cites": p["citationCount"]}
for p in r["data"]]
```
### Snowball expansion
```python
def snowball(seed_ids: list[str], depth: int = 2) -> set[str]:
frontier, seen = set(seed_ids), set(seed_ids)
for _ in range(depth):
next_frontier = set()
for pid in frontier:
r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references",
params={"fields": "paperId", "limit": 100}).json()
next_frontier.update(ref["citedPaper"]["paperId"]
for ref in r.get("data", [])
if ref["citedPaper"].get("paperId"))
frontier = next_frontier - seen
seen.update(frontier)
return seen
```
### Structured extraction prompt
```python
EXTRACT_PROMPT = """Extract from this paper as JSON:
{
"claim": "main thesis in one sentence",
"method": "how they tested it",
"evidence": "key result with numbers",
"n": "sample size",
"limitations": ["limit1", "limit2"],
"novelty": "what this adds vs prior work"
}
If field unknown, use null. Don't invent."""
```
### Steelman opposite (debias)
```python
def steelman(claim: str) -> str:
return client.messages.create(
model="claude-opus-4-7", max_tokens=1024,
messages=[{"role": "user", "content":
f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, "
f"citing actual contrary evidence. Be a hostile reviewer."}],
).content[0].text
```
### Zettelkasten note (atomic)
```markdown
---
id: 2026-05-10-1432
tags: [retrieval, rag]
source: [[Lewis-2020-RAG]]
---
# Dense retrieval beats BM25 only when query-doc lexical overlap is low
In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM)
but BM25 ≥ DPR on TriviaQA where queries copy doc tokens.
→ Hybrid search is robust: pick BM25 for lexical, dense for paraphrase.
Connects to: [[Hybrid Search]] · [[BM25]] · [[Dense-Retrieval]]
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research |
| 매 deep dive (1주) | Manual snowball + AI extraction |
| Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening |
| 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts |
| 매 industry / OSS | GitHub trending + State of X reports + AI synthesis |
**기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman.
## 🔗 Graph
- 부모: [[Scientific Method]]
- 변형: [[Literature-Review]] · [[Tech-Radar]]
- Adjacent: [[Hallucination]] · [[RAG]]
## 🤖 LLM 활용
**언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning.
**언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이.
## ❌ 안티패턴
- **Cite-without-verify**: AI 의 매 만들어낸 fake DOI.
- **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시.
- **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지.
- **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear.
- **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing).
## 🧪 검증 / 중복
- Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |