Publication

Semantic tuples for evaluation of image sentence generation

Conference Article

Conference

Workshop on Vision and Language (VL)

Edition

4th

Doc link

https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/VL/pdf/VL06.pdf

File

Download the digital copy of the doc pdf document

Abstract

The automatic generation of image captions has received considerable attention. The problem of evaluating caption generation systems, though, has not been that much explored. We propose a novel evaluation approach based on comparing the underlying visual semantics of the candidate and ground-truth captions. With this goal in mind we have defined a semantic representation for visually descriptive language and have augmented a subset of the Flickr-8K dataset with semantic annotations. Our evaluation metric (BAST) can be used not only to compare systems but also to do error analysis and get a better understanding of the type of mistakes a system does. To compute BAST we need to predict the semantic representation for the automatically generated captions. We use the Flickr-ST dataset to train classifiers that predict STs so that evaluation can be fully automated.

Categories

computer vision.

Author keywords

computer vision, natural language processing

Scientific reference

L.D. Ellebracht, A. Ramisa, P. Swaroop, J.A. Cordero, F. Moreno-Noguer and A. Quattoni. Semantic tuples for evaluation of image sentence generation, 4th Workshop on Vision and Language, 2015, Lisbon.