Multimodal feedback fusion of laser, image and temporal information

Conference Article


International Conference on Distributed Smart Cameras (ICDSC)





Doc link


Download the digital copy of the doc pdf document


In the present paper, we propose a highly accurate and robust people detector, which works well under highly variant and uncertain conditions, such as occlusions, false positives and false detections. These adverse conditions, which initially motivated this research, occur when a robotic platform navigates in an urban environment, and although the scope is originally within the robotics field, the authors believe that our contributions can be extended to other fields. To this end, we propose a multimodal information fusion consisting of laser and monocular camera information. Laser information is modelled using a set of weak classifiers (Adaboost) to detect people. Camera information is processed by using HOG descriptors to classify person/non person based on a linear SVM. A multi-hypothesis tracker trails the position and velocity of each of the targets, providing temporal information to the fusion, allowing recovery of detections even when the laser segmentation fails. Experimental results show that our feedback-based system outperforms previous state-of-the-art methods in performance and accuracy, and that near real-time detection performance can be achieved.


image classification, object recognition, robot vision.

Scientific reference

I. Huerta, G. Ferrer, F. Herrero, A. Prati and A. Sanfeliu. Multimodal feedback fusion of laser, image and temporal information, 2014 International Conference on Distributed Smart Cameras, 2014, Venice, Italy, pp. 25:1-25:6.