2nd/10_Wiki/Topics/AI_and_ML/Research.md

---
id: wiki-2026-0508-research
title: Research
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Research-Methodology, Literature-Review, Deep-Research]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [research, methodology, literature, ai-aided]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: anthropic-sdk
---

# Research

## 매 한 줄
> **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축.

## 매 핵심

### 매 Phases
1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria).
2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps.
3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source).
4. **Synthesis** — themes, gaps, contradictions.
5. **Hypothesis / contribution** — what novel claim this work adds.
6. **Validation** — experiment / proof / case study.
7. **Communication** — paper, blog, talk.

### 매 Modern toolchain (2026)
- **Search**: Semantic Scholar API, Google Scholar, OpenAlex.
- **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz).
- **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval).
- **Notes**: Obsidian + Zotero integration; Logseq; Reflect.
- **Reproducibility**: Quarto, Jupyter Book, Code Ocean.

### 매 AI-aided literature review pattern
1. Seed papers (3–5 known relevant) → Connected Papers graph.
2. Snowball (citations both ways) → ~100 candidates.
3. LLM screen abstracts: relevance score 0–10.
4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations).
5. AI cluster into themes; human reviews + writes synthesis.

### 매 안전장치 (필수)
- 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must.
- 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling.
- 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt.

### 매 응용
1. PhD literature review.
2. Industry tech radar / market research.
3. Due diligence (M&A, investment).
4. Pre-implementation prior-art search (patents, OSS).

## 💻 패턴

### Claude deep-research synthesis (verify-first)
```python
from anthropic import Anthropic
import httpx

client = Anthropic()

def synthesize(question: str, papers: list[dict]) -> str:
    """papers: [{title, abstract, doi, year}]"""
    corpus = "\n\n".join(
        f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}"
        for i, p in enumerate(papers)
    )
    msg = client.messages.create(
        model="claude-opus-4-7", max_tokens=4096,
        system=("Synthesize evidence. Cite EVERY claim with [index]. "
                "If evidence is weak/contradictory, say so explicitly. "
                "Never fabricate citations."),
        messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}],
    )
    return msg.content[0].text

def verify_dois(text: str, papers: list[dict]) -> list[str]:
    """Hallucination check — every cited DOI must exist in our set."""
    import re
    cited = re.findall(r"doi:(10\.\d+/\S+)", text)
    valid = {p["doi"] for p in papers}
    return [d for d in cited if d not in valid]  # offenders
```

### Semantic Scholar fetch
```python
def search_s2(query: str, limit: int = 50) -> list[dict]:
    r = httpx.get(
        "https://api.semanticscholar.org/graph/v1/paper/search",
        params={"query": query, "limit": limit,
                "fields": "title,abstract,year,citationCount,externalIds"},
    ).json()
    return [{"title": p["title"], "abstract": p.get("abstract") or "",
             "year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"),
             "cites": p["citationCount"]}
            for p in r["data"]]
```

### Snowball expansion
```python
def snowball(seed_ids: list[str], depth: int = 2) -> set[str]:
    frontier, seen = set(seed_ids), set(seed_ids)
    for _ in range(depth):
        next_frontier = set()
        for pid in frontier:
            r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references",
                          params={"fields": "paperId", "limit": 100}).json()
            next_frontier.update(ref["citedPaper"]["paperId"]
                                 for ref in r.get("data", [])
                                 if ref["citedPaper"].get("paperId"))
        frontier = next_frontier - seen
        seen.update(frontier)
    return seen
```

### Structured extraction prompt
```python
EXTRACT_PROMPT = """Extract from this paper as JSON:
{
  "claim": "main thesis in one sentence",
  "method": "how they tested it",
  "evidence": "key result with numbers",
  "n": "sample size",
  "limitations": ["limit1", "limit2"],
  "novelty": "what this adds vs prior work"
}
If field unknown, use null. Don't invent."""
```

### Steelman opposite (debias)
```python
def steelman(claim: str) -> str:
    return client.messages.create(
        model="claude-opus-4-7", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, "
            f"citing actual contrary evidence. Be a hostile reviewer."}],
    ).content[0].text
```

### Zettelkasten note (atomic)
```markdown
---
id: 2026-05-10-1432
tags: [retrieval, rag]
source: [[Lewis-2020-RAG]]
---
# Dense retrieval beats BM25 only when query-doc lexical overlap is low

In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM)
but BM25 ≥ DPR on TriviaQA where queries copy doc tokens.

→ Hybrid search is robust: pick BM25 for lexical, dense for paraphrase.

Connects to: [[Hybrid Search]] · [[BM25]] · [[Dense-Retrieval]]
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research |
| 매 deep dive (1주) | Manual snowball + AI extraction |
| Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening |
| 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts |
| 매 industry / OSS | GitHub trending + State of X reports + AI synthesis |

**기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman.

## 🔗 Graph
- 부모: [[Scientific Method]]
- 변형: [[Literature-Review]] · [[Tech-Radar]]
- Adjacent: [[Hallucination]] · [[RAG]]

## 🤖 LLM 활용
**언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning.
**언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이.

## ❌ 안티패턴
- **Cite-without-verify**: AI 의 매 만들어낸 fake DOI.
- **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시.
- **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지.
- **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear.
- **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing).

## 🧪 검증 / 중복
- Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |