Optimization of robust loss functions for weakly-labeled image taxonomies

Journal Article (2013)


International Journal of Computer Vision







Doc link


Download the digital copy of the doc pdf document


The recently proposed ImageNet dataset consists of several million images, each annotated with a single object category. These annotations may be imperfect, in the sense that many images contain multiple objects belonging to the label vocabulary. In other words, we have a multilabel problem but the annotations include only a single label (which is not necessarily the most prominent). Such a setting motivates the use of a robust evaluation measure, which allows for a limited number of labels to be predicted and, so long as one of the predicted labels is correct, the overall prediction should be considered correct. This is indeed the type of evaluation measure used to assess algorithm performance in a recent competition on ImageNet data. Optimizing such types of performance measures presents several hurdles even with existing structured output learning methods. Indeed, many of the current state-of-the-art methods optimize the prediction of only a single output label, ignoring this ‘structure’ altogether. In this paper, we show how to directly optimize continuous surrogates of such performance measures using structured output learning techniques with latent variables. We use the output of existing binary classifiers as input features in a new learning stage which optimizes the structured loss corresponding to the robust performance measure. We present empirical evidence that this allows us to ‘boost’ the performance of binary classification on a variety of weakly-supervised labeling problems defined on image taxonomies.


computer vision, image classification, object recognition, optimisation.

Author keywords

image labeling, image tagging, image taxonomies, structured learning

Scientific reference

J.J. McAuley, A. Ramisa and T.S. Caetano. Optimization of robust loss functions for weakly-labeled image taxonomies. International Journal of Computer Vision, 104(3): 343-361, 2013.