Technology Transfer Project

Amazon ResearchAward: Geometry-aware 3D Human Body Animation from Still Photos


Technology Transfer Contract

Start Date


End Date


Project illustration


Project Description

This is a project under an 2018 Amazon Research Award.

Being able to automatically generate 3D animations of the human body from a single image would open the door to many new exciting applications in different areas, including the movie industry, photography technologies, fashion and e-commerce business, to name a few. Recent advances in Generative Adversarial Networks (GANs) have shown impressive results for the related task of facial expression synthesis. In [Pumarola ECCV 2018] we introduced one such approach that anatomically encodes facial expressions in a continuous manifold. Automatic facial animation with GANs, however, has been addressed from a purely 2D perspective, being thus limited to single viewpoints, typically fronto- or quasifrontoparallel faces.

In this project, we will extend this problem to the full human body and varying viewpoints, and given a single photo of a person, we will research approaches to forecasting his/her motion and synthesizing the associated images, even when these involve different body orientations and changing postures.

Compared to face animation, bringing still images of the full body to life involves dealing with a much larger variability of body configurations and appearances due to the clothes. Concretely, addressing this complex endeavor requires resolving a number of sub-tasks including foreground-background segmentation, single-image 3D human pose and shape estimation, action recognition, motion prediction, and photorealistic image synthesis. Each one of these problems is, by itself, tremendously challenging.

Yet, we aim at developing GAN architectures able to tackle all of them by integrating different sources of information, either through novel geometry-aware differentiable modules able to estimate and predict human pose and shape, loss functions enforcing photorealism of the synthesized images, as well as attention mechanisms that focus on the region of the image where the person is located.

An additional difficulty is that there is no available dataset of human action video sequences annotated with accurate volumetric body shape parameters and background segmentation masks. Since this type of dataset is indeed very difficult to acquire, we will intend to develop approaches that use as little supervision as possible.

Project Publications

Journal Publications

  • A. Pumarola, A. Agudo, A.M. Martinez, A. Sanfeliu and F. Moreno-Noguer. GANimation: One-shot anatomically consistent facial animation. International Journal of Computer Vision, 128: 698-713, 2020.

    Open/Close abstract Abstract Info Info pdf PDF

Conference Publications

  • W. Guo, E. Corona, F. Moreno-Noguer and X. Alameda. PI-Net: Pose interacting network for multi-person monocular 3D pose estimation, 2021 IEEE Winter Conference on Applications of Computer Vision, 2021, Online, to appear.

    Open/Close abstract Abstract Info Info pdf PDF
  • A. Pumarola, J. Sanchez, G. P. T. Choi, A. Sanfeliu and F. Moreno-Noguer. 3DPeople: Modeling the geometry of dressed humans, 17th International Conference on Computer Vision, 2019, Seoul, pp. 2242-2251.

    Open/Close abstract Abstract Info Info pdf PDF