--- id: wiki-2026-0508-bibliometrics title: Bibliometrics category: 10_Wiki/Topics status: verified canonical_id: self aliases: [계량서지학, citation analysis, h-index, impact factor, altmetrics, scientometrics] duplicate_of: none source_trust_level: B confidence_score: 0.88 verification_status: applied tags: [bibliometrics, citation, h-index, impact-factor, altmetrics, semantic-scholar, openalex, science-of-science] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: Semantic Scholar API / OpenAlex / Scopus --- # Bibliometrics ## 📌 한 줄 통찰 > **"매 knowledge 의 economics"**. 매 citation count + h-index + impact factor 의 quantify. 매 modern: 매 altmetrics + arXiv 의 real-time. 매 Goodhart trap — 매 metric 의 game (citation cartel). ## 📖 핵심 ### 매 indicator #### Citation count - 매 paper 의 cited 횟수. - 매 simple, 매 lag. #### h-index (Hirsch) - 매 author 의 매 h-index = 매 N 개 paper 가 매 N 회 cited. - 매 productivity + impact 의 결합. - 매 한계: 매 self-citation, 매 field 의 difference. #### i10-index - 매 ≥10 cite 의 paper 수. #### Impact Factor (IF) - 매 journal 의 매 2-year average citation. - 매 publisher 의 marketing. - 매 individual paper 의 X (variance). #### Eigenfactor - 매 PageRank-like. - 매 quality-weighted. #### Altmetrics - 매 Twitter / blog / news mention. - 매 immediate. - 매 attention ≠ 매 quality. ### 매 modern source - **Google Scholar**: 매 broad, 매 noisy. - **Scopus** (Elsevier): 매 paid. - **Web of Science** (Clarivate): 매 paid. - **Semantic Scholar** (AI2): 매 free, 매 AI-enriched. - **OpenAlex**: 매 open, 매 250M+ paper. - **CrossRef**: 매 DOI registry. - **arXiv**: 매 preprint. - **PubMed**: 매 biomedical. ### 매 modern issue #### Citation cartel - 매 mutual citation. - 매 self-citation 폭발. - 매 retractable. #### Field difference - 매 CS vs biology vs literature 의 매 base rate 다름. - 매 normalize 필수. #### Time lag - 매 citation 의 매 5 year 의 mature. - 매 fast field (AI) 의 less applicable. #### Predatory journal - 매 paper mill. - 매 IF 의 fake. #### LLM 의 영향 - 매 paper volume 의 explosion. - 매 review 의 saturate. - 매 quality control 의 break. ### 매 modern alternative - **Open peer review**. - **Replication score**. - **Code/data availability**. - **Twitter / Mastodon discussion**. - **YouTube 의 explainer**. - **Cited by GitHub**. ### 매 응용 1. **Hiring / promotion**: 매 academic. 2. **Funding**: 매 grant evaluation. 3. **Library**: 매 journal subscription. 4. **National R&D**: 매 country comparison. 5. **Trend analysis**: 매 emerging topic. 6. **Knowledge graph**: 매 citation network. ## 💻 패턴 ### Semantic Scholar API ```python import requests def get_paper(doi): r = requests.get(f'https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}', params={'fields': 'title,authors,year,citationCount,influentialCitationCount,references,citations'}) return r.json() paper = get_paper('10.48550/arXiv.2206.04615') print(f"{paper['title']}: {paper['citationCount']} citations") ``` ### OpenAlex (open citation data) ```python import requests def search(query, n=20): r = requests.get('https://api.openalex.org/works', params={'search': query, 'per_page': n, 'select': 'id,title,publication_year,cited_by_count,authorships'}) return r.json()['results'] # 매 author h-index def author_h_index(author_id): r = requests.get(f'https://api.openalex.org/works', params={'filter': f'author.id:{author_id}', 'per_page': 200, 'select': 'cited_by_count'}) citations = sorted([w['cited_by_count'] for w in r.json()['results']], reverse=True) h = sum(1 for i, c in enumerate(citations) if c >= i + 1) return h ``` ### Citation network (NetworkX) ```python import networkx as nx def build_citation_network(seed_paper_id, depth=2): G = nx.DiGraph() queue = [(seed_paper_id, 0)] seen = set() while queue: pid, d = queue.pop(0) if pid in seen or d > depth: continue seen.add(pid) paper = get_paper(pid) G.add_node(pid, title=paper['title'], year=paper['year']) for ref in paper.get('references', []): G.add_edge(pid, ref['paperId']) queue.append((ref['paperId'], d + 1)) return G # 매 PageRank 의 influence pageranks = nx.pagerank(G) top_influential = sorted(pageranks.items(), key=lambda x: -x[1])[:10] ``` ### Altmetrics ```python # 매 Altmetric API import requests def altmetric(doi): r = requests.get(f'https://api.altmetric.com/v1/doi/{doi}') if r.status_code != 200: return None data = r.json() return { 'score': data.get('score'), 'twitter': data.get('cited_by_tweeters_count'), 'news': data.get('cited_by_msm_count'), 'blog': data.get('cited_by_feeds_count'), } ``` ### Field-normalized citation ```python def field_normalized_citation_score(paper_citations, field_avg, field_year_avg): """매 field + year 의 normalize.""" expected = field_year_avg return paper_citations / max(expected, 1) # 매 RCR (Relative Citation Ratio) — NIH 의 metric ``` ### Trend detection ```python def emerging_topic(papers_by_year, recent_years=3): """매 recent 의 acceleration 의 detect.""" recent_count = sum(papers_by_year.get(y, 0) for y in range(2024, 2027)) older_count = sum(papers_by_year.get(y, 0) for y in range(2020, 2024)) growth = (recent_count - older_count) / max(older_count, 1) return growth > 1.5 # 매 2.5× growth → 매 emerging ``` ### Predatory journal detector ```python PREDATORY_INDICATORS = [ 'fee mentioned upfront', 'no peer review', 'bogus impact factor', 'misleading scope', 'spam emails', ] def assess_journal(journal): score = 0 if journal.has_apc and journal.apc < 100: score += 1 # too cheap if journal.peer_review_time < 7: score += 1 # too fast if journal.editorial_board_overlap > 50: score += 1 if journal.in_doaj: score -= 2 # whitelist return 'predatory' if score >= 2 else 'legitimate' ``` ## 🤔 결정 기준 | 사용 | Indicator | |---|---| | Single paper | Citation + altmetric + influential citations | | Author | h-index + i10 + field-normalized | | Journal | Eigenfactor (NOT IF) | | Trend | Year-over-year growth | | Country | Field-normalized + collaboration | | Hiring | Mix + qualitative review | **기본값**: OpenAlex / Semantic Scholar (free) + 매 multi-metric + 매 qualitative. ## 🔗 Graph - 변형: [[Citation-Analysis]] · [[Altmetrics]] · [[Scientometrics]] - 응용: [[H-Index]] · [[Impact-Factor]] - Adjacent: [[Goodharts-Law]] ## 🤖 LLM 활용 **언제**: 매 literature review. 매 trend detection. 매 author / journal evaluation. 매 knowledge graph 구축. **언제 X**: 매 single citation count 의 quality conclusion. 매 cross-field comparison without normalize. ## ❌ 안티패턴 - **IF 의 individual paper 적용**: 매 misleading. - **h-index 만**: 매 manipulation. - **No field normalize**: 매 cross-field unfair. - **Self-citation 의 ignore**: 매 inflated. - **Recent paper 의 IF 평가**: 매 lag X. - **Predatory 의 trust**: 매 fake metric. - **Citation 의 quality 의 conflate**: 매 controversial paper 의 high citation. ## 🧪 검증 / 중복 - Verified (Hirsch h-index, NIH RCR, San Francisco DORA declaration). - 신뢰도 B. - Related: [[Awards]] · [[Benchmarks]] · [[Goodharts-Law]] · [[Open-Science]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — indicator + altmetric + 매 OpenAlex / Semantic Scholar code + predatory detector |