We present MultiPhys, a method that enables recovering multi-person 3D motion in a physically-aware
manner. State-of-the-art methods (SLAHMR, top row) for multi-person motion recovery
mostly rely on kinematic approaches, which typically ignore physical constraints, such as
body penetration. Moreover, individual poses are kinematically coherent, their spatial
placement is suboptimal, resulting in significant penetration errors. MultiPhys (bottom row)
incorporates physics constraints into the reconstruction process, yielding more physically
plausible results.
Abstract
We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our
focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees
of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and
effectively eliminates penetration issues between the two individuals. We devise a pipeline in which the
motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive
manner. We introduce distinct components that enable our model to harness the simulator’s properties
without compromising the accuracy of the kinematic estimates. This results in final motion estimates
that are both kinematically coherent and physically compliant. Extensive evaluations on three
challenging datasets characterized by substantial inter-person interaction show that our method
significantly reduces errors associated with penetration and foot skating, while performing
competitively with the state-of-the-art on motion accuracy and smoothness.
Overview Video
* Best viwed with AUDIO
Approach
Given an input video with multiple people (left), we first obtain initial kinematic estimates of the
camera poses and 3D human motion using SLAHMR. Using these initial motion estimates, our proposed
framework corrects them and makes them physically plausible (right).
Pipeline.
We use the policy π to control the humanoid agents with the initial
kinematic poses. We simulate all agents simultaneously in order to apply physics-based constraints to
the reconstructed motion. The policy computes features from both the current state of the simulation and
the target pose to later generate the action signal a that controls the agents. We place our loop-N
component between target poses that correspond to each video frame.
Physics-aware Correction Module.
Results on ExPi, Hi4D, and CHI3D datasets
Results on ExPi dataset.
Results on Hi4D dataset.
Results on CHI3D dataset.
Results with more people
Results with 3 people.
Results with 4 people.
Citation
@inproceedings{ugrinovic2024multiphys,
author={Ugrinovic, Nicolas and Pan, Boxiao and Pavlakos, Georgios and Paschalidou, Despoina and Shen, Bokui and Sanchez-Riera, Jordi and Moreno-Noguer, Francesc and Guibas, Leonidas},
title={MultiPhys: Multi-Person Physics-aware 3D Motion Estimation},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
Acknowledgments
Despoina Paschalidou is supported by the Swiss National Science Foundation under grant number P500PT 206946.
Leonidas Guibas is supported by a Vannevar Bush Faculty Fellowship. This work is partially supported by
projects SMARTGAZEII CPP2021-008760 and MoHuCo PID2020-120049RB-I00.
This project page template is based on this
page. Icons: Flaticon