Introduction

3D body pose estimation from a single image focus on recovering 3D human body pose from a single RGB image. Current works are mostly data-driven as shown in Figure 1. Data-driven methods, however, require large amount of data annotation and cannot generalize well to new environments.



To overcome these limitations, we propose a hybrid approach as shown in Figure 2, where we combine data with prior knowledge extracted from different sources to achieve efficient, robust, and generalizable 3D body pose estimation.



Our current proposed model

An overview of our proposed model is shown below,



The proposed model takes as input a human body image and output 3D body deformable model and camera parameters.

Body representation

We use SMPL [2], a deformable mesh model with 6890 vertices, to represent 3D human body as shown in Figure 5a. SMPL has 69 pose parameters to describe the 23 relative body joints rotation plus one global rotation, and 10 shape parameters to characterize the variation of body height, body proportion, and weight, respectively.



The 3D position of body joints is linear combination of the vertices position. Meanwhile, the dense mesh model allows us to build the correspondences between image and body surface landmarks as shown in Figure 5b.



Describe the body surface landmarks with UV coordinate. Annotate the UV coordinate of the visible body image pixels. Like body joints, image position of body surface landmarks can be predicted through predicting UV coordinates of the visible image pixels. The mapping is shown above.

Generic body pose constraints

Besides using smpl parameters as direct supervision, we can further train the deep learning model by minimizing the reprojection error of 2D body joints and dense body surface landmarks, which has rich and diversify annotations. Meanwhile, we leverage the generic human body constraints to regularize the training and avoid unrealistic estimation.

The summarized constraints are listed below

Constraints formulation

During the training, the summarized constraints as formulated



The penetration loss is realized by first detecting triangle colliding, and then penenalizing the colliding triangle pairs.

Our previous work

We proposed a method to solve upper body pose estimation and tracking by exploiting generic body pose constraints [1]. We use a Bayesian Network (BN) to represent the body pose as shown in Figure 3.



The joints' degree of freedems, and the dependency between different joint are directly captured by number of nodes and the link between them. For joint angle limits and non-penetration constraints, the constraints that can not be directly imposed in the structure, we embed the constraints by learning from generated pseudo-data.

Demo

Baseline

A SOTA model from [3],



Demo

Baseline

A real-time demo from our proprosed method,



References

[1] Data-Free Prior Model for Upper Body Pose Estimation and Tracking

[2] SMPL: A Skinned Multi-Person Linear Model

[3] OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

[4] Coherent Reconstruction of Multiple Humans from a Single Image