Automation based on artificial intelligence becomes necessary when agents such as robots are deployed to perform complex tasks. Detailed representation of a scene makes robots better aware of their surroundings, thereby making it possible to accomplish different tasks in a successful and safe manner. Tasks that involve planning of actions and manipulation of objects require identification and localization of different surfaces in dynamic environments. The usage of structured-light-based depth-sensing devices has gained much attention in the past decade. This is because they are low-cost and capture data in the form of dense depth maps, in addition to color images. Convolutional Neural Networks (CNNs) provide a robust way to extract useful information from the data acquired using these devices. In this chapter we discuss the basic idea behind standard feedforward CNNs, and their application to semantic segmentation and action recognition.


computer vision, feature extraction, learning (artificial intelligence), object recognition, robot vision.

Author keywords

Scene understanding, Deep learning, Semantic labeling, Action recognition

Scientific reference

F. Husain, B. Dellen and C. Torras. Scene understanding using deep learning. In Handbook of Neural Computation, 373-382. Elsevier, 2017.