f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
265 lines
7.7 KiB
Markdown
265 lines
7.7 KiB
Markdown
---
|
||
id: wiki-2026-0508-bibliometrics
|
||
title: Bibliometrics
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [계량서지학, citation analysis, h-index, impact factor, altmetrics, scientometrics]
|
||
duplicate_of: none
|
||
source_trust_level: B
|
||
confidence_score: 0.88
|
||
verification_status: applied
|
||
tags: [bibliometrics, citation, h-index, impact-factor, altmetrics, semantic-scholar, openalex, science-of-science]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: Python
|
||
framework: Semantic Scholar API / OpenAlex / Scopus
|
||
---
|
||
|
||
# Bibliometrics
|
||
|
||
## 📌 한 줄 통찰
|
||
> **"매 knowledge 의 economics"**. 매 citation count + h-index + impact factor 의 quantify. 매 modern: 매 altmetrics + arXiv 의 real-time. 매 Goodhart trap — 매 metric 의 game (citation cartel).
|
||
|
||
## 📖 핵심
|
||
|
||
### 매 indicator
|
||
|
||
#### Citation count
|
||
- 매 paper 의 cited 횟수.
|
||
- 매 simple, 매 lag.
|
||
|
||
#### h-index (Hirsch)
|
||
- 매 author 의 매 h-index = 매 N 개 paper 가 매 N 회 cited.
|
||
- 매 productivity + impact 의 결합.
|
||
- 매 한계: 매 self-citation, 매 field 의 difference.
|
||
|
||
#### i10-index
|
||
- 매 ≥10 cite 의 paper 수.
|
||
|
||
#### Impact Factor (IF)
|
||
- 매 journal 의 매 2-year average citation.
|
||
- 매 publisher 의 marketing.
|
||
- 매 individual paper 의 X (variance).
|
||
|
||
#### Eigenfactor
|
||
- 매 PageRank-like.
|
||
- 매 quality-weighted.
|
||
|
||
#### Altmetrics
|
||
- 매 Twitter / blog / news mention.
|
||
- 매 immediate.
|
||
- 매 attention ≠ 매 quality.
|
||
|
||
### 매 modern source
|
||
- **Google Scholar**: 매 broad, 매 noisy.
|
||
- **Scopus** (Elsevier): 매 paid.
|
||
- **Web of Science** (Clarivate): 매 paid.
|
||
- **Semantic Scholar** (AI2): 매 free, 매 AI-enriched.
|
||
- **OpenAlex**: 매 open, 매 250M+ paper.
|
||
- **CrossRef**: 매 DOI registry.
|
||
- **arXiv**: 매 preprint.
|
||
- **PubMed**: 매 biomedical.
|
||
|
||
### 매 modern issue
|
||
|
||
#### Citation cartel
|
||
- 매 mutual citation.
|
||
- 매 self-citation 폭발.
|
||
- 매 retractable.
|
||
|
||
#### Field difference
|
||
- 매 CS vs biology vs literature 의 매 base rate 다름.
|
||
- 매 normalize 필수.
|
||
|
||
#### Time lag
|
||
- 매 citation 의 매 5 year 의 mature.
|
||
- 매 fast field (AI) 의 less applicable.
|
||
|
||
#### Predatory journal
|
||
- 매 paper mill.
|
||
- 매 IF 의 fake.
|
||
|
||
#### LLM 의 영향
|
||
- 매 paper volume 의 explosion.
|
||
- 매 review 의 saturate.
|
||
- 매 quality control 의 break.
|
||
|
||
### 매 modern alternative
|
||
- **Open peer review**.
|
||
- **Replication score**.
|
||
- **Code/data availability**.
|
||
- **Twitter / Mastodon discussion**.
|
||
- **YouTube 의 explainer**.
|
||
- **Cited by GitHub**.
|
||
|
||
### 매 응용
|
||
1. **Hiring / promotion**: 매 academic.
|
||
2. **Funding**: 매 grant evaluation.
|
||
3. **Library**: 매 journal subscription.
|
||
4. **National R&D**: 매 country comparison.
|
||
5. **Trend analysis**: 매 emerging topic.
|
||
6. **Knowledge graph**: 매 citation network.
|
||
|
||
## 💻 패턴
|
||
|
||
### Semantic Scholar API
|
||
```python
|
||
import requests
|
||
|
||
def get_paper(doi):
|
||
r = requests.get(f'https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}',
|
||
params={'fields': 'title,authors,year,citationCount,influentialCitationCount,references,citations'})
|
||
return r.json()
|
||
|
||
paper = get_paper('10.48550/arXiv.2206.04615')
|
||
print(f"{paper['title']}: {paper['citationCount']} citations")
|
||
```
|
||
|
||
### OpenAlex (open citation data)
|
||
```python
|
||
import requests
|
||
|
||
def search(query, n=20):
|
||
r = requests.get('https://api.openalex.org/works',
|
||
params={'search': query, 'per_page': n,
|
||
'select': 'id,title,publication_year,cited_by_count,authorships'})
|
||
return r.json()['results']
|
||
|
||
# 매 author h-index
|
||
def author_h_index(author_id):
|
||
r = requests.get(f'https://api.openalex.org/works',
|
||
params={'filter': f'author.id:{author_id}', 'per_page': 200,
|
||
'select': 'cited_by_count'})
|
||
citations = sorted([w['cited_by_count'] for w in r.json()['results']], reverse=True)
|
||
h = sum(1 for i, c in enumerate(citations) if c >= i + 1)
|
||
return h
|
||
```
|
||
|
||
### Citation network (NetworkX)
|
||
```python
|
||
import networkx as nx
|
||
|
||
def build_citation_network(seed_paper_id, depth=2):
|
||
G = nx.DiGraph()
|
||
queue = [(seed_paper_id, 0)]
|
||
seen = set()
|
||
|
||
while queue:
|
||
pid, d = queue.pop(0)
|
||
if pid in seen or d > depth: continue
|
||
seen.add(pid)
|
||
paper = get_paper(pid)
|
||
G.add_node(pid, title=paper['title'], year=paper['year'])
|
||
|
||
for ref in paper.get('references', []):
|
||
G.add_edge(pid, ref['paperId'])
|
||
queue.append((ref['paperId'], d + 1))
|
||
|
||
return G
|
||
|
||
# 매 PageRank 의 influence
|
||
pageranks = nx.pagerank(G)
|
||
top_influential = sorted(pageranks.items(), key=lambda x: -x[1])[:10]
|
||
```
|
||
|
||
### Altmetrics
|
||
```python
|
||
# 매 Altmetric API
|
||
import requests
|
||
|
||
def altmetric(doi):
|
||
r = requests.get(f'https://api.altmetric.com/v1/doi/{doi}')
|
||
if r.status_code != 200: return None
|
||
data = r.json()
|
||
return {
|
||
'score': data.get('score'),
|
||
'twitter': data.get('cited_by_tweeters_count'),
|
||
'news': data.get('cited_by_msm_count'),
|
||
'blog': data.get('cited_by_feeds_count'),
|
||
}
|
||
```
|
||
|
||
### Field-normalized citation
|
||
```python
|
||
def field_normalized_citation_score(paper_citations, field_avg, field_year_avg):
|
||
"""매 field + year 의 normalize."""
|
||
expected = field_year_avg
|
||
return paper_citations / max(expected, 1)
|
||
|
||
# 매 RCR (Relative Citation Ratio) — NIH 의 metric
|
||
```
|
||
|
||
### Trend detection
|
||
```python
|
||
def emerging_topic(papers_by_year, recent_years=3):
|
||
"""매 recent 의 acceleration 의 detect."""
|
||
recent_count = sum(papers_by_year.get(y, 0) for y in range(2024, 2027))
|
||
older_count = sum(papers_by_year.get(y, 0) for y in range(2020, 2024))
|
||
|
||
growth = (recent_count - older_count) / max(older_count, 1)
|
||
return growth > 1.5 # 매 2.5× growth → 매 emerging
|
||
```
|
||
|
||
### Predatory journal detector
|
||
```python
|
||
PREDATORY_INDICATORS = [
|
||
'fee mentioned upfront',
|
||
'no peer review',
|
||
'bogus impact factor',
|
||
'misleading scope',
|
||
'spam emails',
|
||
]
|
||
|
||
def assess_journal(journal):
|
||
score = 0
|
||
if journal.has_apc and journal.apc < 100: score += 1 # too cheap
|
||
if journal.peer_review_time < 7: score += 1 # too fast
|
||
if journal.editorial_board_overlap > 50: score += 1
|
||
if journal.in_doaj: score -= 2 # whitelist
|
||
return 'predatory' if score >= 2 else 'legitimate'
|
||
```
|
||
|
||
## 🤔 결정 기준
|
||
| 사용 | Indicator |
|
||
|---|---|
|
||
| Single paper | Citation + altmetric + influential citations |
|
||
| Author | h-index + i10 + field-normalized |
|
||
| Journal | Eigenfactor (NOT IF) |
|
||
| Trend | Year-over-year growth |
|
||
| Country | Field-normalized + collaboration |
|
||
| Hiring | Mix + qualitative review |
|
||
|
||
**기본값**: OpenAlex / Semantic Scholar (free) + 매 multi-metric + 매 qualitative.
|
||
|
||
## 🔗 Graph
|
||
- 변형: [[Citation-Analysis]] · [[Altmetrics]] · [[Scientometrics]]
|
||
- 응용: [[H-Index]] · [[Impact-Factor]]
|
||
- Adjacent: [[Goodharts-Law]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 literature review. 매 trend detection. 매 author / journal evaluation. 매 knowledge graph 구축.
|
||
**언제 X**: 매 single citation count 의 quality conclusion. 매 cross-field comparison without normalize.
|
||
|
||
## ❌ 안티패턴
|
||
- **IF 의 individual paper 적용**: 매 misleading.
|
||
- **h-index 만**: 매 manipulation.
|
||
- **No field normalize**: 매 cross-field unfair.
|
||
- **Self-citation 의 ignore**: 매 inflated.
|
||
- **Recent paper 의 IF 평가**: 매 lag X.
|
||
- **Predatory 의 trust**: 매 fake metric.
|
||
- **Citation 의 quality 의 conflate**: 매 controversial paper 의 high citation.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Hirsch h-index, NIH RCR, San Francisco DORA declaration).
|
||
- 신뢰도 B.
|
||
- Related: [[Awards]] · [[Benchmarks]] · [[Goodharts-Law]] · [[Open-Science]].
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — indicator + altmetric + 매 OpenAlex / Semantic Scholar code + predatory detector |
|