Vehicle pose estimation using G-Net: Multi-class localization and depth estimation

Conference Article


Catalan Conference on Artificial Intelligence (CCIA)





Doc link


Download the digital copy of the doc pdf document


In this paper we present a new network architecture, called G-Net, for 3D pose estimation on RGB images which is trained in a weakly supervised manner. We introduce a two step pipeline based on region-based Convolutional neural networks (CNNs) for feature localization, bounding box refinement based on non-maximum-suppression and depth estimation. The G-Net is able to estimate the depth from single monocular images with a self-tuned loss function. The combination of this predicted depth and the presented two-step localization allows the extraction of the 3D pose of the object. We show in experiments that our method achieves good results compared to other state-of-the-art approaches which are trained in a fully supervised manner.


pattern recognition.

Author keywords

Deep Learning, pose estimation, vehicle detection

Scientific reference

J. Garcia, A. Agudo and F. Moreno-Noguer. Vehicle pose estimation using G-Net: Multi-class localization and depth estimation, 21st Catalan Conference on Artificial Intelligence, 2018, Roses, in Artificial Intelligence Research and Development, Vol 308 of Frontiers in Artificial Intelligence and Applications, pp. 355-364, 2018, IOS Press.