Files
2nd/10_Wiki/Topics/AI_and_ML/Bibliometrics.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

265 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-bibliometrics
title: Bibliometrics
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [계량서지학, citation analysis, h-index, impact factor, altmetrics, scientometrics]
duplicate_of: none
source_trust_level: B
confidence_score: 0.88
verification_status: applied
tags: [bibliometrics, citation, h-index, impact-factor, altmetrics, semantic-scholar, openalex, science-of-science]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: Semantic Scholar API / OpenAlex / Scopus
---
# Bibliometrics
## 📌 한 줄 통찰
> **"매 knowledge 의 economics"**. 매 citation count + h-index + impact factor 의 quantify. 매 modern: 매 altmetrics + arXiv 의 real-time. 매 Goodhart trap — 매 metric 의 game (citation cartel).
## 📖 핵심
### 매 indicator
#### Citation count
- 매 paper 의 cited 횟수.
- 매 simple, 매 lag.
#### h-index (Hirsch)
- 매 author 의 매 h-index = 매 N 개 paper 가 매 N 회 cited.
- 매 productivity + impact 의 결합.
- 매 한계: 매 self-citation, 매 field 의 difference.
#### i10-index
- 매 ≥10 cite 의 paper 수.
#### Impact Factor (IF)
- 매 journal 의 매 2-year average citation.
- 매 publisher 의 marketing.
- 매 individual paper 의 X (variance).
#### Eigenfactor
- 매 PageRank-like.
- 매 quality-weighted.
#### Altmetrics
- 매 Twitter / blog / news mention.
- 매 immediate.
- 매 attention ≠ 매 quality.
### 매 modern source
- **Google Scholar**: 매 broad, 매 noisy.
- **Scopus** (Elsevier): 매 paid.
- **Web of Science** (Clarivate): 매 paid.
- **Semantic Scholar** (AI2): 매 free, 매 AI-enriched.
- **OpenAlex**: 매 open, 매 250M+ paper.
- **CrossRef**: 매 DOI registry.
- **arXiv**: 매 preprint.
- **PubMed**: 매 biomedical.
### 매 modern issue
#### Citation cartel
- 매 mutual citation.
- 매 self-citation 폭발.
- 매 retractable.
#### Field difference
- 매 CS vs biology vs literature 의 매 base rate 다름.
- 매 normalize 필수.
#### Time lag
- 매 citation 의 매 5 year 의 mature.
- 매 fast field (AI) 의 less applicable.
#### Predatory journal
- 매 paper mill.
- 매 IF 의 fake.
#### LLM 의 영향
- 매 paper volume 의 explosion.
- 매 review 의 saturate.
- 매 quality control 의 break.
### 매 modern alternative
- **Open peer review**.
- **Replication score**.
- **Code/data availability**.
- **Twitter / Mastodon discussion**.
- **YouTube 의 explainer**.
- **Cited by GitHub**.
### 매 응용
1. **Hiring / promotion**: 매 academic.
2. **Funding**: 매 grant evaluation.
3. **Library**: 매 journal subscription.
4. **National R&D**: 매 country comparison.
5. **Trend analysis**: 매 emerging topic.
6. **Knowledge graph**: 매 citation network.
## 💻 패턴
### Semantic Scholar API
```python
import requests
def get_paper(doi):
r = requests.get(f'https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}',
params={'fields': 'title,authors,year,citationCount,influentialCitationCount,references,citations'})
return r.json()
paper = get_paper('10.48550/arXiv.2206.04615')
print(f"{paper['title']}: {paper['citationCount']} citations")
```
### OpenAlex (open citation data)
```python
import requests
def search(query, n=20):
r = requests.get('https://api.openalex.org/works',
params={'search': query, 'per_page': n,
'select': 'id,title,publication_year,cited_by_count,authorships'})
return r.json()['results']
# 매 author h-index
def author_h_index(author_id):
r = requests.get(f'https://api.openalex.org/works',
params={'filter': f'author.id:{author_id}', 'per_page': 200,
'select': 'cited_by_count'})
citations = sorted([w['cited_by_count'] for w in r.json()['results']], reverse=True)
h = sum(1 for i, c in enumerate(citations) if c >= i + 1)
return h
```
### Citation network (NetworkX)
```python
import networkx as nx
def build_citation_network(seed_paper_id, depth=2):
G = nx.DiGraph()
queue = [(seed_paper_id, 0)]
seen = set()
while queue:
pid, d = queue.pop(0)
if pid in seen or d > depth: continue
seen.add(pid)
paper = get_paper(pid)
G.add_node(pid, title=paper['title'], year=paper['year'])
for ref in paper.get('references', []):
G.add_edge(pid, ref['paperId'])
queue.append((ref['paperId'], d + 1))
return G
# 매 PageRank 의 influence
pageranks = nx.pagerank(G)
top_influential = sorted(pageranks.items(), key=lambda x: -x[1])[:10]
```
### Altmetrics
```python
# 매 Altmetric API
import requests
def altmetric(doi):
r = requests.get(f'https://api.altmetric.com/v1/doi/{doi}')
if r.status_code != 200: return None
data = r.json()
return {
'score': data.get('score'),
'twitter': data.get('cited_by_tweeters_count'),
'news': data.get('cited_by_msm_count'),
'blog': data.get('cited_by_feeds_count'),
}
```
### Field-normalized citation
```python
def field_normalized_citation_score(paper_citations, field_avg, field_year_avg):
"""매 field + year 의 normalize."""
expected = field_year_avg
return paper_citations / max(expected, 1)
# 매 RCR (Relative Citation Ratio) — NIH 의 metric
```
### Trend detection
```python
def emerging_topic(papers_by_year, recent_years=3):
"""매 recent 의 acceleration 의 detect."""
recent_count = sum(papers_by_year.get(y, 0) for y in range(2024, 2027))
older_count = sum(papers_by_year.get(y, 0) for y in range(2020, 2024))
growth = (recent_count - older_count) / max(older_count, 1)
return growth > 1.5 # 매 2.5× growth → 매 emerging
```
### Predatory journal detector
```python
PREDATORY_INDICATORS = [
'fee mentioned upfront',
'no peer review',
'bogus impact factor',
'misleading scope',
'spam emails',
]
def assess_journal(journal):
score = 0
if journal.has_apc and journal.apc < 100: score += 1 # too cheap
if journal.peer_review_time < 7: score += 1 # too fast
if journal.editorial_board_overlap > 50: score += 1
if journal.in_doaj: score -= 2 # whitelist
return 'predatory' if score >= 2 else 'legitimate'
```
## 🤔 결정 기준
| 사용 | Indicator |
|---|---|
| Single paper | Citation + altmetric + influential citations |
| Author | h-index + i10 + field-normalized |
| Journal | Eigenfactor (NOT IF) |
| Trend | Year-over-year growth |
| Country | Field-normalized + collaboration |
| Hiring | Mix + qualitative review |
**기본값**: OpenAlex / Semantic Scholar (free) + 매 multi-metric + 매 qualitative.
## 🔗 Graph
- 변형: [[Citation-Analysis]] · [[Altmetrics]] · [[Scientometrics]]
- 응용: [[H-Index]] · [[Impact-Factor]]
- Adjacent: [[Goodharts-Law]]
## 🤖 LLM 활용
**언제**: 매 literature review. 매 trend detection. 매 author / journal evaluation. 매 knowledge graph 구축.
**언제 X**: 매 single citation count 의 quality conclusion. 매 cross-field comparison without normalize.
## ❌ 안티패턴
- **IF 의 individual paper 적용**: 매 misleading.
- **h-index 만**: 매 manipulation.
- **No field normalize**: 매 cross-field unfair.
- **Self-citation 의 ignore**: 매 inflated.
- **Recent paper 의 IF 평가**: 매 lag X.
- **Predatory 의 trust**: 매 fake metric.
- **Citation 의 quality 의 conflate**: 매 controversial paper 의 high citation.
## 🧪 검증 / 중복
- Verified (Hirsch h-index, NIH RCR, San Francisco DORA declaration).
- 신뢰도 B.
- Related: [[Awards]] · [[Benchmarks]] · [[Goodharts-Law]] · [[Open-Science]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — indicator + altmetric + 매 OpenAlex / Semantic Scholar code + predatory detector |