--- id: wiki-2026-0508-research title: Research category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Research-Methodology, Literature-Review, Deep-Research] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [research, methodology, literature, ai-aided] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: anthropic-sdk --- # Research ## 매 한 줄 > **"매 모든 답은 누군가 이미 reformulated"**. Research는 매 question → literature → synthesis → novel contribution의 매 disciplined loop — 2026 의 매 AI-aided synthesis (Claude Opus 4.7 deep research, GPT-5 with browsing, Elicit, Consensus, undermind.ai) 가 매 weeks of work 를 매 hours로 단축. ## 매 핵심 ### 매 Phases 1. **Question framing** — vague curiosity → specific testable question (PICO, FINER criteria). 2. **Literature scoping** — keywords, citation graph (forward/backward), Connected Papers / Litmaps. 3. **Reading & extraction** — structured notes (Zettelkasten, claim-evidence-source). 4. **Synthesis** — themes, gaps, contradictions. 5. **Hypothesis / contribution** — what novel claim this work adds. 6. **Validation** — experiment / proof / case study. 7. **Communication** — paper, blog, talk. ### 매 Modern toolchain (2026) - **Search**: Semantic Scholar API, Google Scholar, OpenAlex. - **Discovery**: Connected Papers, Litmaps, Inciteful (citation graph viz). - **AI synthesis**: Claude Opus 4.7 deep-research mode, GPT-5 deep research, Elicit (extracts data per paper), Consensus (claim-level), undermind.ai (deep retrieval). - **Notes**: Obsidian + Zotero integration; Logseq; Reflect. - **Reproducibility**: Quarto, Jupyter Book, Code Ocean. ### 매 AI-aided literature review pattern 1. Seed papers (3–5 known relevant) → Connected Papers graph. 2. Snowball (citations both ways) → ~100 candidates. 3. LLM screen abstracts: relevance score 0–10. 4. Top 30 → full-text PDF → AI structured extraction (claim, method, evidence, limitations). 5. AI cluster into themes; human reviews + writes synthesis. ### 매 안전장치 (필수) - 매 hallucination 의 적: AI 의 매 fake citation 매 흔함 → DOI 의 매 verify 의 must. - 매 echo chamber: AI synthesis 의 매 popular sources 매 over-weight → manually 의 매 deliberate diverse sampling. - 매 confirmation bias: AI 의 매 user의 매 hypothesis 매 align — 매 explicit "steelman opposite" prompt. ### 매 응용 1. PhD literature review. 2. Industry tech radar / market research. 3. Due diligence (M&A, investment). 4. Pre-implementation prior-art search (patents, OSS). ## 💻 패턴 ### Claude deep-research synthesis (verify-first) ```python from anthropic import Anthropic import httpx client = Anthropic() def synthesize(question: str, papers: list[dict]) -> str: """papers: [{title, abstract, doi, year}]""" corpus = "\n\n".join( f"[{i}] {p['title']} ({p['year']}, doi:{p['doi']})\n{p['abstract']}" for i, p in enumerate(papers) ) msg = client.messages.create( model="claude-opus-4-7", max_tokens=4096, system=("Synthesize evidence. Cite EVERY claim with [index]. " "If evidence is weak/contradictory, say so explicitly. " "Never fabricate citations."), messages=[{"role": "user", "content": f"Q: {question}\n\nPapers:\n{corpus}"}], ) return msg.content[0].text def verify_dois(text: str, papers: list[dict]) -> list[str]: """Hallucination check — every cited DOI must exist in our set.""" import re cited = re.findall(r"doi:(10\.\d+/\S+)", text) valid = {p["doi"] for p in papers} return [d for d in cited if d not in valid] # offenders ``` ### Semantic Scholar fetch ```python def search_s2(query: str, limit: int = 50) -> list[dict]: r = httpx.get( "https://api.semanticscholar.org/graph/v1/paper/search", params={"query": query, "limit": limit, "fields": "title,abstract,year,citationCount,externalIds"}, ).json() return [{"title": p["title"], "abstract": p.get("abstract") or "", "year": p.get("year"), "doi": p.get("externalIds", {}).get("DOI"), "cites": p["citationCount"]} for p in r["data"]] ``` ### Snowball expansion ```python def snowball(seed_ids: list[str], depth: int = 2) -> set[str]: frontier, seen = set(seed_ids), set(seed_ids) for _ in range(depth): next_frontier = set() for pid in frontier: r = httpx.get(f"https://api.semanticscholar.org/graph/v1/paper/{pid}/references", params={"fields": "paperId", "limit": 100}).json() next_frontier.update(ref["citedPaper"]["paperId"] for ref in r.get("data", []) if ref["citedPaper"].get("paperId")) frontier = next_frontier - seen seen.update(frontier) return seen ``` ### Structured extraction prompt ```python EXTRACT_PROMPT = """Extract from this paper as JSON: { "claim": "main thesis in one sentence", "method": "how they tested it", "evidence": "key result with numbers", "n": "sample size", "limitations": ["limit1", "limit2"], "novelty": "what this adds vs prior work" } If field unknown, use null. Don't invent.""" ``` ### Steelman opposite (debias) ```python def steelman(claim: str) -> str: return client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[{"role": "user", "content": f"Claim: {claim}\n\nWrite the strongest argument AGAINST this, " f"citing actual contrary evidence. Be a hostile reviewer."}], ).content[0].text ``` ### Zettelkasten note (atomic) ```markdown --- id: 2026-05-10-1432 tags: [retrieval, rag] source: [[Lewis-2020-RAG]] --- # Dense retrieval beats BM25 only when query-doc lexical overlap is low In Lewis 2020 (Table 3), DPR > BM25 on NaturalQuestions (+6 EM) but BM25 ≥ DPR on TriviaQA where queries copy doc tokens. → Hybrid search is robust: pick BM25 for lexical, dense for paraphrase. Connects to: [[Hybrid-Search]] · [[BM25]] · [[Dense-Retrieval]] ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 빠른 scan (1h) | Elicit / Consensus / Claude deep-research | | 매 deep dive (1주) | Manual snowball + AI extraction | | Systematic review (PRISMA) | PRISMA flow + Covidence + AI screening | | 매 cutting-edge (preprints) | arXiv-sanity + Twitter/Bluesky + Semantic Scholar alerts | | 매 industry / OSS | GitHub trending + State of X reports + AI synthesis | **기본값**: Connected Papers seed → S2 snowball → AI extract → manual synthesis with steelman. ## 🔗 Graph - 부모: [[Scientific-Method]] - 변형: [[Literature-Review]] · [[Tech-Radar]] - Adjacent: [[Hallucination]] · [[RAG]] ## 🤖 LLM 활용 **언제**: literature scan, abstract screening, structured extraction, synthesis draft, steelmanning. **언제 X**: novelty claim 의 매 final assertion (LLM 의 매 ground truth 의 X), 매 quantitative meta-analysis (use proper stats software), 매 citation 의 verify 없이. ## ❌ 안티패턴 - **Cite-without-verify**: AI 의 매 만들어낸 fake DOI. - **Single-source synthesis**: 매 한 paper 의 매 truth로 취급 — 매 replication 의 무시. - **Recency bias**: 매 latest preprint 만 → 매 foundational work 의 무지. - **No gap analysis**: literature dump 의 매 only — 매 "what's missing" 의 부재 → contribution 의 unclear. - **Hypothesis fishing**: 매 data 부터 → 매 post-hoc theory (HARKing). ## 🧪 검증 / 중복 - Verified (PRISMA 2020 statement, Semantic Scholar API docs, Claude Opus 4.7 deep research, Elicit methodology). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full rewrite covering methodology + AI-aided synthesis pipeline |