Publication

Action recognition based on efficient deep feature learning in the spatio-temporal domain

Journal Article (2016)

Journal

IEEE Robotics and Automation Letters

Pages

984-991

Volume

1

Number

2

Doc link

http://dx.doi.org/10.1109/LRA.2016.2529686

File

Download the digital copy of the doc pdf document

Abstract

Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.

Categories

artificial intelligence, computer vision, pattern classification.

Author keywords

Computer vision for automation, recognition, visual learning

Scientific reference

F. Husain, B. Dellen and C. Torras. Action recognition based on efficient deep feature learning in the spatio-temporal domain. IEEE Robotics and Automation Letters, 1(2): 984-991, 2016.