Markerless 3D human pose detection from a single image is a severely underconstrained problem because different 3D poses can have similar image projections. In order to handle this ambiguity, current approaches rely on prior shape models that can only be correctly adjusted if 2D image features are accurately detected. Unfortunately, although current 2D part detector algorithms have shown promising results, they are not yet accurate enough to guarantee a complete disambiguation of the 3D inferred shape. In this paper, we introduce a novel approach for estimating 3D human pose even when observations are noisy. We propose a stochastic sampling strategy to propagate the noise from the image plane to the shape space. This provides a set of ambiguous 3D shapes, which are virtually undistinguishable from their image projections. Disambiguation is then achieved by imposing kinematic constraints that guarantee the resulting pose resembles a 3D human shape. We validate the method on a variety of situations in which state-of-the-art 2D detectors yield either inaccurate estimations or partly miss some of the body parts.


computer vision, pose estimation.

Scientific reference

E. Simo-Serra, A. Ramisa, G. Alenyà, C. Torras and F. Moreno-Noguer. Single image 3D human pose estimation from noisy observations, 2012 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012, Providence, RI, USA, pp. 2673-2680, IEEE Press.