Discriminative learning of deep convolutional feature point descriptors

Conference Article


International Conference on Computer Vision (ICCV)





Doc link


Download the digital copy of the doc pdf document


Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and test- ing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.


computer vision, feature extraction.

Author keywords

computer vision, deep learning

Scientific reference

E. Simo-Serra, E. Trulls Fortuny, L. Ferraz, I. Kokkinos, P. Fua and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors, 15th International Conference on Computer Vision, 2015, Santiago de Chile, pp. 118-126, IEEE.