The problem of predicting human motion given a sequence of past observations is at the core of many applications in robotics and computer vision. Current state-of-the-art formulates this problem as a sequence-to-sequence task, in which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that predicts future movements, typically in the order of 1 to 2 seconds. However, one aspect that has been obviated so far, is the fact that human motion is inherently driven by interactions with objects and/or other humans in the environment. In this paper, we explore this scenario using a novel context-aware motion prediction architecture. We use a semantic-graph model where the nodes parameterize the human and objects in the scene and the edges their mutual interactions. These interactions are iteratively learned through a graph attention layer, fed with the past observations, which now include both object and human body motions. Once this semantic graph is learned, we inject it to a standard RNN to predict future movements of the human/s and object/s. We consider two variants of our architecture, either freezing the contextual interactions in the future of updating them. A thorough evaluation in the Whole-Body Human Motion Database shows that in both cases, our context-aware networks clearly outperform baselines in which the context information is not considered.


pattern recognition.

Scientific reference

E. Corona, A. Pumarola, G. Alenyà and F. Moreno-Noguer. Context-aware human motion prediction, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, WA, USA (Virtual), pp. 6990-6999.