Publication

PIRO: Permutation-invariant relational network for multi-person 3D pose estimation

Conference Article

Conference

International Conference on Computer Vision Theory and Applications (VISAPP)

Edition

19th

Pages

295-305

Doc link

https://researchr.org/publication/UgrinovicRASM24

File

Download the digital copy of the doc pdf document

Abstract

Recovering multi-person 3D poses from a single RGB image is an ill-conditioned problem due to the inherent 2D-3D depth ambiguity, inter-person occlusions, and body truncation. To tackle these issues, recent works have shown promising results by simultaneously reasoning for different individuals. However, in most cases this is done by only considering pairwise inter-person interactions or between pairs of body parts, thus hindering a holistic scene representation able to capture long-range interactions. Some approaches that jointly process all people in the scene require defining one of the individuals as a reference and a pre-defined person ordering or limiting the number of individuals thus being sensitive to these choice. In this paper, we overcome both these limitations, and we propose an approach for multi-person 3D pose estimation that captures long range interactions independently of the input order. We build a residual-like permutation-invariant network that successfully refines potentially corrupted initial 3D poses estimated by off-the-shelf detectors. The residual function is learned via a Set Attention mechanism. Despite of our model being relatively straightforward, a thorough evaluation demonstrates that our approach is able to boost the performance of the initially estimated 3D poses by large margins, achieving state-of-the-art results on two standardized benchmarks.

Categories

computer vision, pattern recognition.

Author keywords

Human pose estimation, 3D, single-view.

Scientific reference

N. Ugrinovic, A. Ruiz, A. Agudo, A. Sanfeliu and F. Moreno-Noguer. PIRO: Permutation-invariant relational network for multi-person 3D pose estimation, 19th International Conference on Computer Vision Theory and Applications, 2024, Rome (Italy), pp. 295-305.