Publication

Visual semantic relatedness dataset for image captioning

Conference Article

Conference

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Edition

2023

Pages

5598-5606

Doc link

http://dx.doi.org/10.1109/CVPRW59228.2023.00592

File

Download the digital copy of the doc pdf document

Authors

Abstract

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions [30] has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.

Categories

computer vision.

Author keywords

Training, Visualization, Computer vision, Conferences, Semantics, Pattern recognition, Task analysis

Scientific reference

A. Sabir, F. Moreno-Noguer and L. Padró. Visual semantic relatedness dataset for image captioning, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023, Vancouver, Canada, pp. 5598-5606.