Sensor-agnostic multimodal fusion for multiple object tracking from camera, radar, lidar and V2X

Conference Article


FISITA 2023 World Congress (FISITA)



Doc link


Download the digital copy of the doc pdf document


Automated vehicles rely on different sensors to detect and track other vehicles and road users over time, to then be able to plan and execute safe trajectories. The characteristics of sensors are quite diverse and will probably be even more so in the future, so in this work we present a sensor-agnostic multimodal fusion framework for multiple object tracking that can seamlessly integrate information coming from different object detectors, sensors, and vehicle-to-everything messages, either from other road users or from the infrastructure. All the information received is converted to a standardized set of detections that are then combined using a Kalman Filter with a constant velocity model. To ensure robustness, we propose methods to handle errors in classification and incorrect bounding box reconstruction, a couple of problems that are often ignored in academic literature, although they are very relevant in practice. To evaluate our framework, we use three diverse and challenging scenarios. First, we present results for a perception system based on camera and radar that was integrated into a prototype traffic jam chauffeur function. Second, we show qualitative results for a traffic monitoring application in highways, with multiple cameras, lidars and a radar. And finally, we show how our framework can integrate vehicle-to-everything messages to improve the safety of vulnerable road users, such as pedestrians and cyclists, with an autonomous emergency braking function in proving grounds.


mobile robots.

Author keywords

Multimodal, Detection, Tracking, Sensor Fusion, Cooperative perception.

Scientific reference

M. Pérez and A. Agudo. Sensor-agnostic multimodal fusion for multiple object tracking from camera, radar, lidar and V2X, 2023 FISITA 2023 World Congress, 2023, Barcelona, to appear.