In this paper, we present an efficient and reliable deep-learning approach that allows users to communicate with robots via hand gesture recognition. Contrary to other works which use external devices such as gloves [1] or joysticks [2] to tele-operate robots, the proposed approach uses only visual information to recognize user's instructions that are encoded in a set of pre-defined hand gestures. Particularly, the method consists of two modules which work sequentially to extract 2D landmarks of hands –ie. joints positions– and to predict the hand gesture based on a temporal representation of them. The approach has been validated in a recent state-of-the-art dataset where it outperformed other methods that use multiple pre-processing steps such as optical flow and semantic segmentation. Our method achieves an accuracy of 87.5% and runs at 10 frames per second. Finally, we conducted real-life experiments with our IVO robot to validate the framework during the interaction process.


learning (artificial intelligence), service robots.

Author keywords

Human Robot Interaction, Hand gesture recognition

Scientific reference

M. Peral, A. Sanfeliu and A. Garrell Zulueta. Efficient hand gesture recognition for human-robot interaction. IEEE Robotics and Automation Letters, 7(4): 10272-10279, 2022.