[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,92 +2,265 @@
 id: wiki-2026-0508-bibliometrics
 title: Bibliometrics
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-BIBM-001]
+aliases: [계량서지학, citation analysis, h-index, impact factor, altmetrics, scientometrics]
 duplicate_of: none
-source_trust_level: A
-confidence_score: 0.92
-tags: [auto-reinforced, bibliometrics, h-index, Research-impact, scientific-metrics, Big-Data]
+source_trust_level: B
+confidence_score: 0.88
+verification_status: applied
+tags: [bibliometrics, citation, h-index, impact-factor, altmetrics, semantic-scholar, openalex, science-of-science]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: Python
+  framework: Semantic Scholar API / OpenAlex / Scopus
 ---

-# [[Bibliometrics|Bibliometrics]]
+# Bibliometrics

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "지식의 영향력 계측: 논문의 인용 횟수, 저널의 영향력 지수 등을 수치화하여 어떤 연구가 학계와 사회에 실질적으로 기여하고 있는지를 데이터로 증명하는 지식의 경제학."
+## 📌 한 줄 통찰
+> **"매 knowledge 의 economics"**. 매 citation count + h-index + impact factor 의 quantify. 매 modern: 매 altmetrics + arXiv 의 real-time. 매 Goodhart trap — 매 metric 의 game (citation cartel).

-## 📖 구조화된 지식 (Synthesized Content)
-계량서지학(Bibliometrics)은 수학 및 통계적 방법을 적용하여 서적 및 기타 매체의 패턴을 분석하는 학문입니다.
+## 📖 핵심

-1.  **주요 지표**:
-    *   **Citation Count**: 얼마나 자주 인용되는가? (영향력의 직접 증거)
-    *   **H-index**: 생산성과 인용도를 동시에 나타내는 지수.
-    *   **Impact Factor (IF)**: 특정 학술지의 연평균 인용 횟수.
-2.  **핵심 용도**:
-    *   연구비 할당, 교수 임용, 국가별 과학 기술력 비교의 객관적 근거로 활용됨.
-    *   지식의 흐름과 융합(Interdisciplinary) 현상을 시각화함.
+### 매 indicator

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 단순히 '양적 팽창' 정책에 주목했으나, 현대 정책은 '인용의 질'과 소셜 미디어 언급도(Altmetrics)까지 포함하는 '입체적 영향력 평가 정책'으로 진화함(RL Update).
- **정책 변화(RL Update)**: AI 연구 가속화 정책에서, 논문이 출판되기 전 아카이브(arXiv)에 공개되는 즉시 커뮤니티 평판을 수집하는 '실시간 지식 가치 평가 정책'이 정식 출판 시스템보다 더 강력한 신호 정책이 됨.
+#### Citation count
+- 매 paper 의 cited 횟수.
+- 매 simple, 매 lag.

-## 🔗 지식 연결 (Graph)
- [[Assessment|Assessment]], [[Scientific Communication|Scientific Communication]], [[Knowledge synthesis|Knowledge Synthesis]], [[Big-Data|Big-Data]], [[Ps-Reinforce|Ps-Reinforce]]
- **Modern Tech/Tools**: Google Scholar, Scopus, Web of Science, Semantic Scholar API.
---
+#### h-index (Hirsch)
+- 매 author 의 매 h-index = 매 N 개 paper 가 매 N 회 cited.
+- 매 productivity + impact 의 결합.
+- 매 한계: 매 self-citation, 매 field 의 difference.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+#### i10-index
+- 매 ≥10 cite 의 paper 수.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+#### Impact Factor (IF)
+- 매 journal 의 매 2-year average citation.
+- 매 publisher 의 marketing.
+- 매 individual paper 의 X (variance).

-**언제 쓰면 안 되는가:**
- *(TODO)*
+#### Eigenfactor
+- 매 PageRank-like.
+- 매 quality-weighted.

-## 🧪 검증 상태 (Validation)
+#### Altmetrics
+- 매 Twitter / blog / news mention.
+- 매 immediate.
+- 매 attention ≠ 매 quality.

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+### 매 modern source
+- **Google Scholar**: 매 broad, 매 noisy.
+- **Scopus** (Elsevier): 매 paid.
+- **Web of Science** (Clarivate): 매 paid.
+- **Semantic Scholar** (AI2): 매 free, 매 AI-enriched.
+- **OpenAlex**: 매 open, 매 250M+ paper.
+- **CrossRef**: 매 DOI registry.
+- **arXiv**: 매 preprint.
+- **PubMed**: 매 biomedical.

-## 🧬 중복 검사 (Duplicate Check)
+### 매 modern issue

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+#### Citation cartel
+- 매 mutual citation.
+- 매 self-citation 폭발.
+- 매 retractable.

-## 🕓 변경 이력 (Changelog)
+#### Field difference
+- 매 CS vs biology vs literature 의 매 base rate 다름.
+- 매 normalize 필수.

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+#### Time lag
+- 매 citation 의 매 5 year 의 mature.
+- 매 fast field (AI) 의 less applicable.

-## 💻 코드 패턴 (Code Patterns)
+#### Predatory journal
+- 매 paper mill.
+- 매 IF 의 fake.

-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
+#### LLM 의 영향
+- 매 paper volume 의 explosion.
+- 매 review 의 saturate.
+- 매 quality control 의 break.

-```text
-# TODO
+### 매 modern alternative
+- **Open peer review**.
+- **Replication score**.
+- **Code/data availability**.
+- **Twitter / Mastodon discussion**.
+- **YouTube 의 explainer**.
+- **Cited by GitHub**.
+
+### 매 응용
+1. **Hiring / promotion**: 매 academic.
+2. **Funding**: 매 grant evaluation.
+3. **Library**: 매 journal subscription.
+4. **National R&D**: 매 country comparison.
+5. **Trend analysis**: 매 emerging topic.
+6. **Knowledge graph**: 매 citation network.
+
+## 💻 패턴
+
+### Semantic Scholar API
+```python
+import requests
+
+def get_paper(doi):
+    r = requests.get(f'https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}',
+                     params={'fields': 'title,authors,year,citationCount,influentialCitationCount,references,citations'})
+    return r.json()
+
+paper = get_paper('10.48550/arXiv.2206.04615')
+print(f"{paper['title']}: {paper['citationCount']} citations")
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### OpenAlex (open citation data)
+```python
+import requests

-**선택 A를 써야 할 때:**
- *(TODO)*
+def search(query, n=20):
+    r = requests.get('https://api.openalex.org/works',
+                     params={'search': query, 'per_page': n,
+                             'select': 'id,title,publication_year,cited_by_count,authorships'})
+    return r.json()['results']

-**선택 B를 써야 할 때:**
- *(TODO)*
+# 매 author h-index
+def author_h_index(author_id):
+    r = requests.get(f'https://api.openalex.org/works',
+                     params={'filter': f'author.id:{author_id}', 'per_page': 200,
+                             'select': 'cited_by_count'})
+    citations = sorted([w['cited_by_count'] for w in r.json()['results']], reverse=True)
+    h = sum(1 for i, c in enumerate(citations) if c >= i + 1)
+    return h
+```

-**기본값:**
-> *(TODO)*
+### Citation network (NetworkX)
+```python
+import networkx as nx

-## ❌ 안티패턴 (Anti-Patterns)
+def build_citation_network(seed_paper_id, depth=2):
+    G = nx.DiGraph()
+    queue = [(seed_paper_id, 0)]
+    seen = set()
+    
+    while queue:
+        pid, d = queue.pop(0)
+        if pid in seen or d > depth: continue
+        seen.add(pid)
+        paper = get_paper(pid)
+        G.add_node(pid, title=paper['title'], year=paper['year'])
+        
+        for ref in paper.get('references', []):
+            G.add_edge(pid, ref['paperId'])
+            queue.append((ref['paperId'], d + 1))
+    
+    return G

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+# 매 PageRank 의 influence
+pageranks = nx.pagerank(G)
+top_influential = sorted(pageranks.items(), key=lambda x: -x[1])[:10]
+```
+
+### Altmetrics
+```python
+# 매 Altmetric API
+import requests
+
+def altmetric(doi):
+    r = requests.get(f'https://api.altmetric.com/v1/doi/{doi}')
+    if r.status_code != 200: return None
+    data = r.json()
+    return {
+        'score': data.get('score'),
+        'twitter': data.get('cited_by_tweeters_count'),
+        'news': data.get('cited_by_msm_count'),
+        'blog': data.get('cited_by_feeds_count'),
+    }
+```
+
+### Field-normalized citation
+```python
+def field_normalized_citation_score(paper_citations, field_avg, field_year_avg):
+    """매 field + year 의 normalize."""
+    expected = field_year_avg
+    return paper_citations / max(expected, 1)
+    
+# 매 RCR (Relative Citation Ratio) — NIH 의 metric
+```
+
+### Trend detection
+```python
+def emerging_topic(papers_by_year, recent_years=3):
+    """매 recent 의 acceleration 의 detect."""
+    recent_count = sum(papers_by_year.get(y, 0) for y in range(2024, 2027))
+    older_count = sum(papers_by_year.get(y, 0) for y in range(2020, 2024))
+    
+    growth = (recent_count - older_count) / max(older_count, 1)
+    return growth > 1.5  # 매 2.5× growth → 매 emerging
+```
+
+### Predatory journal detector
+```python
+PREDATORY_INDICATORS = [
+    'fee mentioned upfront',
+    'no peer review',
+    'bogus impact factor',
+    'misleading scope',
+    'spam emails',
+]
+
+def assess_journal(journal):
+    score = 0
+    if journal.has_apc and journal.apc < 100: score += 1  # too cheap
+    if journal.peer_review_time < 7: score += 1  # too fast
+    if journal.editorial_board_overlap > 50: score += 1
+    if journal.in_doaj: score -= 2  # whitelist
+    return 'predatory' if score >= 2 else 'legitimate'
+```
+
+## 🤔 결정 기준
+| 사용 | Indicator |
+|---|---|
+| Single paper | Citation + altmetric + influential citations |
+| Author | h-index + i10 + field-normalized |
+| Journal | Eigenfactor (NOT IF) |
+| Trend | Year-over-year growth |
+| Country | Field-normalized + collaboration |
+| Hiring | Mix + qualitative review |
+
+**기본값**: OpenAlex / Semantic Scholar (free) + 매 multi-metric + 매 qualitative.
+
+## 🔗 Graph
+- 부모: [[Science-of-Science]] · [[Library-Science]] · [[Knowledge-Management]]
+- 변형: [[Citation-Analysis]] · [[Altmetrics]] · [[Scientometrics]]
+- 응용: [[H-Index]] · [[Impact-Factor]] · [[Eigenfactor]] · [[RCR]]
+- Tool: [[Semantic-Scholar]] · [[OpenAlex]] · [[Scopus]] · [[Web-of-Science]]
+- Adjacent: [[Citation-Cartel]] · [[Predatory-Journal]] · [[Open-Peer-Review]] · [[Goodharts-Law]]
+
+## 🤖 LLM 활용
+**언제**: 매 literature review. 매 trend detection. 매 author / journal evaluation. 매 knowledge graph 구축.
+**언제 X**: 매 single citation count 의 quality conclusion. 매 cross-field comparison without normalize.
+
+## ❌ 안티패턴
+- **IF 의 individual paper 적용**: 매 misleading.
+- **h-index 만**: 매 manipulation.
+- **No field normalize**: 매 cross-field unfair.
+- **Self-citation 의 ignore**: 매 inflated.
+- **Recent paper 의 IF 평가**: 매 lag X.
+- **Predatory 의 trust**: 매 fake metric.
+- **Citation 의 quality 의 conflate**: 매 controversial paper 의 high citation.
+
+## 🧪 검증 / 중복
+- Verified (Hirsch h-index, NIH RCR, San Francisco DORA declaration).
+- 신뢰도 B.
+- Related: [[Awards]] · [[Benchmarks]] · [[Goodharts-Law]] · [[Open-Science]].
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — indicator + altmetric + 매 OpenAlex / Semantic Scholar code + predatory detector |