Publication

VQ-HPS: Human pose and shape estimation in a vector-quantized latent space

Conference Article

Conference

European Conference on Computer Vision (ECCV)

Edition

18th

Pages

471-490

Doc link

https://doi.org/10.1007/978-3-031-72943-0_27

File

Download the digital copy of the doc pdf document

Authors

Abstract

Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for realistic results, whereas recent non-parametric methods achieve higher precision by directly regressing the 3D coordinates of the human body mesh. This work introduces a novel paradigm to address the HPSE problem, involving a low-dimensional discrete latent representation of the human mesh and framing HPSE as a classification task. Instead of predicting body model parameters or 3D vertex coordinates, we focus on predicting the proposed discrete latent representation, which can be decoded into a registered human mesh. This innovative paradigm offers two key advantages. Firstly, predicting a low-dimensional discrete representation confines our predictions to the space of anthropomorphic poses and shapes even when little training data is available. Secondly, by framing the problem as a classification task, we can harness the discriminative power inherent in neural networks. The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh. The experimental results demonstrate that VQ-HPS outperforms the current state-of-the-art non-parametric approaches while yielding results as realistic as those produced by parametric methods when trained with little data. VQ-HPS also shows promising results when training on large-scale datasets, highlighting the significant potential of the classification approach for HPSE.

Categories

computer vision.

Author keywords

Human pose and shape estimation, human mesh recovery, vector quantized autoencoder, transformers.

Scientific reference

G. Fiche, S. Leglaive, X. Alameda, A. Agudo and F. Moreno-Noguer. VQ-HPS: Human pose and shape estimation in a vector-quantized latent space, 18th European Conference on Computer Vision, 2024, Milano, Italy, in Computer Vision – ECCV 2024, Vol 15110 of Lecture Notes in Computer Science, pp. 471-490, 2024.