--- id: CV-OBJ-DET-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [computer-vision, ai, object-detection, yolo, bounding-box, localization] last_reinforced: 2026-04-26 --- # Object Detection Foundations (๊ฐ์ฒด ํƒ์ง€ ๊ธฐ์ดˆ) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ด๋ฏธ์ง€๋ผ๋Š” ํ‰๋ฉด ๊ณต๊ฐ„์—์„œ ์‚ฌ๋ฌผ์˜ '๋ฌด์—‡(What)'๊ณผ '์–ด๋””(Where)'๋ฅผ ๋™์‹œ์— ์ •๋ณตํ•˜์—ฌ, ๊ธฐ๊ณ„๊ฐ€ ์‹œ๊ฐ์  ์„ธ๊ณ„๋ฅผ ๋…ผ๋ฆฌ์ ์œผ๋กœ ํ•ด์ฒดํ•˜๊ฒŒ ํ•˜๋ผ" โ€” ์ด๋ฏธ์ง€ ๋‚ด์— ์กด์žฌํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ฐ์ฒด์˜ ์ข…๋ฅ˜๋ฅผ ๋ถ„๋ฅ˜(Classification)ํ•˜๊ณ  ๊ทธ ์œ„์น˜๋ฅผ ๊ฒฝ๊ณ„ ์ƒ์ž(Bounding Box)๋กœ ํ‘œ์‹œ(Localization)ํ•˜๋Š” ์ปดํ“จํ„ฐ ๋น„์ „ ๊ธฐ์ˆ . ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** "Feature Pyramid and Anchors" โ€” ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ์žก๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€์˜ ์—ฌ๋Ÿฌ ํ•ด์ƒ๋„์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ๋ฏธ๋ฆฌ ์ •์˜๋œ ์‚ฌ๊ฐํ˜•(Anchors)์„ ๋ฐ”ํƒ•์œผ๋กœ ์‹ค์ œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๋ฏธ์„ธํ•˜๊ฒŒ ์กฐ์ •ํ•˜์—ฌ ์˜ˆ์ธกํ•˜๋Š” ํŒจํ„ด. - **์ฃผ์š” ์•„ํ‚คํ…์ฒ˜:** - **One-stage Detectors (YOLO, SSD):** ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ํ•œ ๋ฒˆ๋งŒ ํ›‘์–ด ์ฆ‰์‹œ ๊ฒฐ๊ณผ ๋„์ถœ. ๋งค์šฐ ๋น ๋ฆ„. - **Two-stage Detectors (R-CNN, Faster R-CNN):** ํ›„๋ณด ์˜์—ญ์„ ๋จผ์ € ๋ฝ‘๊ณ  ์ƒ์„ธ ๊ฒ€์ฆ. ์ •๋ฐ€๋„๊ฐ€ ๋†’์Œ. - **ํ•ต์‹ฌ ์ง€ํ‘œ:** - **IoU (Intersection over Union):** ์ •๋‹ต ์ƒ์ž์™€ ์˜ˆ์ธก ์ƒ์ž๊ฐ€ ์–ผ๋งˆ๋‚˜ ๊ฒน์น˜๋Š”์ง€ ์ธก์ •. - **mAP (mean Average Precision):** ๋ชจ๋ธ์˜ ์ „์ฒด์ ์ธ ํƒ์ƒ‰ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ‘œ์ค€ ํ‰๊ฐ€์ง€ํ‘œ. - **์˜์˜:** ์ž์œจ์ฃผํ–‰์ฐจ์˜ ์žฅ์• ๋ฌผ ์ธ์‹, CCTV์˜ ์ด์ƒ ํ–‰๋™ ๊ฐ์ง€, ๊ณต์ • ์ž๋™ํ™”์˜ ๋ถˆ๋Ÿ‰ ๊ฒ€์ถœ ๋“ฑ ์‹œ๊ฐ ์ง€๋Šฅ์ด ํ•„์š”ํ•œ ๋ชจ๋“  ์‹ค์ „ ๋ถ„์•ผ์˜ ํ•ต์‹ฌ ๊ธฐ์ˆ . ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์•ต์ปค ๋ฐ•์Šค(Anchors)๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ์ˆ˜์ž‘์—…์˜ ๋ณต์žกํ•จ์„ ๋„˜์–ด, ์ตœ๊ทผ์—๋Š” ์•ต์ปค ์—†์ด ์ ์ด๋‚˜ ์ค‘์‹ฌ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํƒ์ง€ํ•˜๋Š” Anchor-free ๋ฐฉ์‹(CenterNet ๋“ฑ)๊ณผ ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ํ™œ์šฉํ•œ DETR ๊ณ„์—ด์ด ์ฃผ๋ฅ˜๋กœ ๋ถ€์ƒํ•จ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์—์ด์ „ํŠธ์˜ ์‹œ๊ฐ ์ธํ„ฐํŽ˜์ด์Šค ๋ถ„์„ ์‹œ, ์ €์ง€์—ฐ ์‘๋‹ต์„ ์œ„ํ•ด ์ตœ์ ํ™”๋œ YOLOv8 ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ™”๋ฉด ๋‚ด์˜ ๋ฒ„ํŠผ, ํ…์ŠคํŠธ ์ž…๋ ฅ์ฐฝ ๋“ฑ UI ์š”์†Œ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํƒ์ง€ํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Computer-Vision-Foundations, [[Image-Segmentation|Image-Segmentation]], Convolutional-Neural-Networks-CNN, [[Non-linear-Activation-Functions|Non-linear-Activation-Functions]] - **Raw Source:** 10_Wiki/Topics/AI/Object-Detection-Foundations.md