Recent advances in generative adversarial networks (GANs) have shown impressive results for the task of facial expression synthesis. The most successful architecture is StarGAN (Choi et al. in CVPR, 2018), that conditions GANs’ generation process with images of a specific domain, namely a set of images of people sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content and granularity of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on action units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combining several of them. Additionally, we propose a weakly supervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit a novel self-learned attention mechanism that makes our network robust to changing backgrounds, lighting conditions and occlusions. Extensive evaluation shows that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild. The code of this work is publicly available at


computer vision.

Author keywords

GAN, Face Animation, Action-Unit, Condition

Scientific reference

A. Pumarola, A. Agudo, A.M. Martinez, A. Sanfeliu and F. Moreno-Noguer. GANimation: One-shot anatomically consistent facial animation. International Journal of Computer Vision, 128: 698-713, 2020.