We propose a robust and efficient method to estimate the pose of a camera with respect to complex 3D textured models of the environment that can potentially contain more than 100, 000 points. To tackle this problem we follow a top down approach where we combine high-level deep network classifiers with low level geometric approaches to come up with a solution that is fast, robust and accurate. Given an input image, we initially use a pre-trained deep network to compute a rough estimation of the camera pose. This initial estimate constrains the number of 3D model points that can be seen from the camera viewpoint. We then establish 3D-to-2D correspondences between these potentially visible points of the model and the 2D detected image features. Accurate pose estimation is finally obtained from the 2D-to-3D correspondences using a novel PnP algorithm that rejects outliers without the need to use a RANSAC strategy, and which is between 10 and 100 times faster than other methods that use it. Two real experimentsdealing with very large and complex 3D models demonstrate the effectiveness of the approach.


computer vision, pose estimation.

Author keywords

camera pose estimation, deep learning, complex 3D models

Scientific reference

A. Rubio, M. Villamizar, L. Ferraz, A. Penate-Sanchez, A. Ramisa, E. Simo-Serra, A. Sanfeliu and F. Moreno-Noguer. Efficient monocular pose estimation for complex 3D models, 2015 IEEE International Conference on Robotics and Automation, 2015, Seattle, WA, USA, pp. 1397-1402.