Files
2nd/10_Wiki/Topics/AI_and_ML/Pose-Estimation.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

5.3 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-pose-estimation Pose Estimation 10_Wiki/Topics verified self
Human Pose Estimation
HPE
Keypoint Detection
none A 0.9 applied
computer-vision
pose-estimation
deep-learning
keypoints
2026-05-10 pending
language framework
python pytorch, mmpose, mediapipe

Pose Estimation

매 한 줄

"매 image/video에서 인체 keypoints (joints) 위치 detection.". OpenPose (2017)가 multi-person bottom-up을 popularize, MediaPipe로 mobile real-time, 2024-2025 ViTPose / SAM-style transformer가 SOTA.

매 핵심

매 두 가지 paradigm

  • Top-down: detect person bbox → crop → keypoint regression. 매 정확, slow with crowd.
  • Bottom-up: keypoints first → group into persons (PAF / associative embedding). 매 fast at scale.
  • Single-stage (modern): YOLO-Pose, ED-Pose — detection + keypoints joint.

매 표현 방식

  • 2D keypoints: (x, y, confidence) — COCO 17 keypoints standard.
  • 3D pose: (x, y, z) — single image lift 또는 multi-view.
  • SMPL / mesh: full body parametric model — VIBE, HMR, 4D-Humans.

매 응용

  1. AR/VR avatar driving (Meta Quest, Apple Vision Pro).
  2. Fitness coaching (form correction).
  3. Sports analytics (gait, biomechanics).
  4. Animation mocap markerless.
  5. Surveillance / fall detection.

💻 패턴

MediaPipe (real-time, on-device)

import mediapipe as mp
import cv2

mp_pose = mp.solutions.pose
pose = mp_pose.Pose(model_complexity=1, min_detection_confidence=0.5)

cap = cv2.VideoCapture(0)
while cap.isOpened():
    ok, frame = cap.read()
    if not ok: break
    results = pose.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    if results.pose_landmarks:
        for lm in results.pose_landmarks.landmark:
            print(lm.x, lm.y, lm.visibility)

MMPose (research, ViTPose backbone)

from mmpose.apis import MMPoseInferencer

inferencer = MMPoseInferencer(pose2d='vitpose-h')
result = next(inferencer('image.jpg', show=False))
keypoints = result['predictions'][0][0]['keypoints']  # (17, 2)
scores = result['predictions'][0][0]['keypoint_scores']

YOLO-Pose (Ultralytics, single-stage)

from ultralytics import YOLO

model = YOLO('yolo11n-pose.pt')
results = model('image.jpg')
for r in results:
    kpts = r.keypoints.xy  # (n_persons, 17, 2)
    conf = r.keypoints.conf

3D lift (VideoPose3D-style)

import torch
# 2D (T, 17, 2) -> 3D (T, 17, 3) via temporal CNN
class TemporalLift(torch.nn.Module):
    def __init__(self, n_kpts=17, ch=1024):
        super().__init__()
        self.expand = torch.nn.Conv1d(n_kpts*2, ch, 3, padding=1)
        self.blocks = torch.nn.Sequential(*[
            torch.nn.Sequential(
                torch.nn.Conv1d(ch, ch, 3, padding=1, dilation=d),
                torch.nn.BatchNorm1d(ch), torch.nn.ReLU()
            ) for d in (3, 9, 27)
        ])
        self.head = torch.nn.Conv1d(ch, n_kpts*3, 1)

    def forward(self, x):  # x: (B, T, 17, 2)
        B, T = x.shape[:2]
        x = x.reshape(B, T, -1).transpose(1, 2)
        return self.head(self.blocks(self.expand(x))).transpose(1, 2).reshape(B, T, -1, 3)

COCO keypoint metric (OKS / mAP)

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

gt = COCO('person_keypoints_val2017.json')
dt = gt.loadRes('predictions.json')
e = COCOeval(gt, dt, 'keypoints')
e.evaluate(); e.accumulate(); e.summarize()
# AP @ OKS=.50:.95 — 표준 metric

SMPL mesh recovery (4D-Humans / HMR2)

from hmr2.models import load_hmr2
model, model_cfg = load_hmr2('logs/checkpoints/epoch=35.ckpt')
out = model(image_tensor)
verts = out['pred_vertices']    # (B, 6890, 3)
betas = out['pred_smpl_params']['betas']
pose  = out['pred_smpl_params']['body_pose']

매 결정 기준

상황 Approach
Mobile / web real-time MediaPipe Pose
Highest accuracy single image ViTPose-H (MMPose)
Multi-person crowd YOLO-Pose / ED-Pose (single-stage)
3D from monocular video 4D-Humans / WHAM
Animation mocap SMPL / SMPL-X based
Edge device < 10ms MoveNet Lightning, RTMPose-tiny

기본값: 2D는 RTMPose, 3D mesh는 4D-Humans.

🔗 Graph

🤖 LLM 활용

언제: vision-action pipeline 의 input feature, fitness/AR app, mocap automation. 언제 X: facial keypoints는 face-specific model (MediaPipe Face Mesh, dlib), hand는 MediaPipe Hands.

안티패턴

  • Top-down without bbox tracking: 매 frame redetect — temporal jitter 매 심각. ByteTrack 결합.
  • 2D regression direct (x,y) without heatmap: 매 lower accuracy. Heatmap supervision 매 표준.
  • 3D from single 2D pose: depth ambiguity — temporal context 또는 multi-view 필요.
  • Ignoring camera intrinsics for 3D: 매 metric scale wrong.

🧪 검증 / 중복

  • Verified (MMPose docs, Ultralytics YOLO11-pose, MediaPipe docs, COCO keypoint benchmark).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — pose estimation paradigms + modern stack (ViTPose, YOLO-Pose, 4D-Humans)