OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Gines Hidalgo; Shih-En Wei; Tomas Simon; Yaser Sheikh; Zhe Cao

arxiv: 1812.08008 · v2 · pith:4UFXRBF7new · submitted 2018-12-18 · 💻 cs.CV

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao , Gines Hidalgo , Tomas Simon , Shih-En Wei , Yaser Sheikh This is my paper

classification 💻 cs.CV

keywords bodyrealtimepartposeaccuracyestimationfootimage

0 comments

read the original abstract

Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?
cs.CV 2019-07 unverdicted novelty 6.0

Introduces ZSSLR problem and ASL-Text dataset; uses text embeddings with 3D-CNN and bi-LSTM video features for zero-shot sign recognition.
Linking Art through Human Poses
cs.CV 2019-07 unverdicted novelty 6.0

Human pose similarity matching with spatial verification outperforms standard content-based image retrieval for discovering composition transfers in art on a manually annotated dataset.
Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior
cs.HC 2026-04 unverdicted novelty 4.0

A pipeline uses OpenPose and Gaze-LLE to extract pose and gaze data from classroom videos, deletes the raw footage, and applies an LLM for zero-shot behavioral analysis of student attention.
Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping
cs.CV 2019-06 unverdicted novelty 4.0

A pipeline using OpenPose keypoints and DTW-1NN classifies gestures in RGB videos with flexibility to add new classes via few examples.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision
cs.HC 2026-04 unverdicted novelty 3.0

Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
Sequence-to-Sequence Natural Language to Humanoid Robot Sign Language
cs.RO 2019-07 unverdicted novelty 3.0

Applies established seq2seq neural networks to convert text to Spanish sign language for humanoid robot TEO, proposing OpenPose for skeleton data collection to handle sequence length differences and non-manual markers.