--- id: DATA-IR-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [data-science, information-retrieval, search-engine, ranking, nlp, rag] last_reinforced: 2026-04-26 --- # Information Retrieval (IR, 정보 검색) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λ°μ΄ν„°μ˜ λ°”λ‹€μ—μ„œ λ‹¨μˆœν•œ ν‚€μ›Œλ“œκ°€ μ•„λ‹Œ, μ‚¬μš©μžμ˜ 질문과 κ°€μž₯ λ°€μ ‘ν•œ '의미의 μ •μˆ˜'λ₯Ό 건져 올렀라" β€” λΉ„μ •ν˜• λ°μ΄ν„°μ…‹μ—μ„œ μ‚¬μš©μžμ˜ 정보 μš”κ΅¬(Query)에 λΆ€ν•©ν•˜λŠ” μœ νš¨ν•œ 정보λ₯Ό μ°Ύμ•„λ‚΄κ³ , κ΄€λ ¨μ„±(Relevance)에 따라 μˆœμœ„λ₯Ό 맀겨 μ œκ³΅ν•˜λŠ” ν•™λ¬Έ 및 기술. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Retrieve and Rank" β€” 1λ‹¨κ³„μ—μ„œ λŒ€λŸ‰μ˜ 데이터 쀑 후보ꡰ을 λΉ λ₯΄κ²Œ μΆ”μΆœ(Retrieval)ν•˜κ³ , 2λ‹¨κ³„μ—μ„œ μ •κ΅ν•œ λͺ¨λΈμ„ 톡해 졜적의 μˆœμ„œλ₯Ό κ²°μ •(Ranking)ν•˜λŠ” 단계적 필터링 νŒ¨ν„΄. - **핡심 λͺ¨λΈ:** - **Boolean Model:** ν‚€μ›Œλ“œμ˜ 쑴재 μ—¬λΆ€λ§Œ νŒλ‹¨ (고전적 방식). - **Vector Space Model (TF-IDF, BM25):** 단어 λΉˆλ„μ™€ ν¬μ†Œμ„±μ„ λ°”νƒ•μœΌλ‘œ λ¬Έμ„œμ˜ μ€‘μš”λ„ 계산. - **Neural IR (Dense Retrieval):** μž„λ² λ”© 벑터 κ°„ μœ μ‚¬λ„λ₯Ό 톡해 의미둠적 검색 μˆ˜ν–‰ (ν˜„λŒ€μ  방식). - **평가 μ§€ν‘œ:** Precision, Recall, F1-Score, 그리고 μˆœμœ„μ˜ 정확도λ₯Ό μΈ‘μ •ν•˜λŠ” MRR, nDCG λ“±. - **의의:** AI μ—μ΄μ „νŠΈκ°€ μ™ΈλΆ€ 지식을 ν™œμš©ν•˜λŠ” RAG(Retrieval-Augmented Generation) μ‹œμŠ€ν…œμ˜ μ„±λŠ₯을 κ²°μ •μ§“λŠ” κ°€μž₯ μ€‘μš”ν•œ 인프라. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœ ν‚€μ›Œλ“œ λ§€μΉ­ μ€‘μ‹¬μ—μ„œ, μ΄μ œλŠ” μ‚¬μš©μžμ˜ 질문 λ§₯락을 μ΄ν•΄ν•˜λŠ” μ‹œλ§¨ν‹± 검색(Semantic Search)κ³Ό ν•˜μ΄λΈŒλ¦¬λ“œ 검색(ν‚€μ›Œλ“œ+의미)이 검색 ν’ˆμ§ˆμ˜ ν‘œμ€€μ΄ 됨. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” 1,174개의 μœ„ν‚€ λ¬Έμ„œ 검색 μ‹œ BM25와 Dense Vector 검색을 μ•™μƒλΈ”ν•œ ν•˜μ΄λΈŒλ¦¬λ“œ IR μ‹œμŠ€ν…œμ„ μ‚¬μš©ν•˜μ—¬ 지식 νƒμƒ‰μ˜ 정확도λ₯Ό κ·ΉλŒ€ν™”ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Indexing-Strategies]], Vector-Database-Foundations, NLP-Foundations, [[Hallucination-in-LLMs]] - **Raw Source:** 10_Wiki/Topics/AI/Information-Retrieval-IR.md