MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Nicolas Ugrinovic¹ Boxiao Pan² Georgios Pavlakos³ Despoina Paschalidou² Bokui Shen²
Jordi Sanchez-Riera¹ Francesc Moreno-Noguer¹ Leonidas Guibas²

¹ IRI-UPC ² Stanford University ³ University of Berkeley

CVPR 2024

Paper

Supplementary Github code

Code

We present MultiPhys, a method that enables recovering multi-person 3D motion in a physically-aware manner. State-of-the-art methods (SLAHMR, top row) for multi-person motion recovery mostly rely on kinematic approaches, which typically ignore physical constraints, such as body penetration. Moreover, individual poses are kinematically coherent, their spatial placement is suboptimal, resulting in significant penetration errors. MultiPhys (bottom row) incorporates physics constraints into the reconstruction process, yielding more physically plausible results.

Abstract

We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipeline in which the motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive manner. We introduce distinct components that enable our model to harness the simulator’s properties without compromising the accuracy of the kinematic estimates. This results in final motion estimates that are both kinematically coherent and physically compliant. Extensive evaluations on three challenging datasets characterized by substantial inter-person interaction show that our method significantly reduces errors associated with penetration and foot skating, while performing competitively with the state-of-the-art on motion accuracy and smoothness.

Overview Video

* Best viwed with AUDIO

Approach

Given an input video with multiple people (left), we first obtain initial kinematic estimates of the camera poses and 3D human motion using SLAHMR. Using these initial motion estimates, our proposed framework corrects them and makes them physically plausible (right).

We use the policy π to control the humanoid agents with the initial kinematic poses. We simulate all agents simultaneously in order to apply physics-based constraints to the reconstructed motion. The policy computes features from both the current state of the simulation and the target pose to later generate the action signal a that controls the agents. We place our loop-N component between target poses that correspond to each video frame.

Results on ExPi, Hi4D, and CHI3D datasets

Results on ExPi dataset.

Results on Hi4D dataset.

Results on CHI3D dataset.

Results with more people

Results with 3 people.

Results with 4 people.

Citation

@inproceedings{ugrinovic2024multiphys,
                author={Ugrinovic, Nicolas and Pan, Boxiao and Pavlakos, Georgios and Paschalidou, Despoina and Shen, Bokui and Sanchez-Riera, Jordi and Moreno-Noguer, Francesc and Guibas, Leonidas},
                title={MultiPhys: Multi-Person Physics-aware 3D Motion Estimation},
                booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
                year={2024}
}

Acknowledgments

Despoina Paschalidou is supported by the Swiss National Science Foundation under grant number P500PT 206946. Leonidas Guibas is supported by a Vannevar Bush Faculty Fellowship. This work is partially supported by projects SMARTGAZEII CPP2021-008760 and MoHuCo PID2020-120049RB-I00. This project page template is based on this page. Icons: Flaticon

Contact

For any questions, please contact Nicolas Ugrinovic.