Technology Transfer Project
Amazon ResearchAward: Geometry-aware 3D Human Body Animation from Still Photos
Type
Technology Transfer Contract
Start Date
14/03/2019
End Date
31/12/2023
Staff
-
-
Agudo, Antonio
Researcher
-
Guo, Wen
PhD Student
-
Pumarola, Albert
Member
Project Description
This is a project under an 2018 Amazon Research Award.
Being able to automatically generate 3D animations of the human body from a single image would open the door to many new exciting applications in different areas, including the movie industry, photography technologies, fashion and e-commerce business, to name a few. Recent advances in Generative Adversarial Networks (GANs) have shown impressive results for the related task of facial expression synthesis. In [Pumarola ECCV 2018] we introduced one such approach that anatomically encodes facial expressions in a continuous manifold. Automatic facial animation with GANs, however, has been addressed from a purely 2D perspective, being thus limited to single viewpoints, typically fronto- or quasifrontoparallel faces.
In this project, we will extend this problem to the full human body and varying viewpoints, and given a single photo of a person, we will research approaches to forecasting his/her motion and synthesizing the associated images, even when these involve different body orientations and changing postures.
Compared to face animation, bringing still images of the full body to life involves dealing with a much larger variability of body configurations and appearances due to the clothes. Concretely, addressing this complex endeavor requires resolving a number of sub-tasks including foreground-background segmentation, single-image 3D human pose and shape estimation, action recognition, motion prediction, and photorealistic image synthesis. Each one of these problems is, by itself, tremendously challenging.
Yet, we aim at developing GAN architectures able to tackle all of them by integrating different sources of information, either through novel geometry-aware differentiable modules able to estimate and predict human pose and shape, loss functions enforcing photorealism of the synthesized images, as well as attention mechanisms that focus on the region of the image where the person is located.
An additional difficulty is that there is no available dataset of human action video sequences annotated with accurate volumetric body shape parameters and background segmentation masks. Since this type of dataset is indeed very difficult to acquire, we will intend to develop approaches that use as little supervision as possible.
Project Publications
Journal Publications
Conference Publications
-
W. Guo, Y. Du, X. Shen, V. Lepetit, X. Alameda and F. Moreno-Noguer. Back to MLP: A simple baseline for human motion prediction, 2023 IEEE Winter Conference on Applications of Computer Vision, 2023, Waikoloa, Hawaii, pp. 4798-4808.
Abstract Info PDF
-
W. Guo, E. Corona, F. Moreno-Noguer and X. Alameda. PI-Net: Pose interacting network for multi-person monocular 3D pose estimation, 21st IEEE Winter Conference on Applications of Computer Vision, 2021, (Virtual), pp. 2795-2805.
Abstract Info PDF
-
A. Pumarola, J. Sanchez, G. P. T. Choi, A. Sanfeliu and F. Moreno-Noguer. 3DPeople: Modeling the geometry of dressed humans, 2019 International Conference on Computer Vision, 2019, Seoul, pp. 2242-2251.
Abstract Info PDF
Follow us!