Publication

Multi-modal fashion product retrieval

Conference Article

Conference

Workshop on Vision and Language (VL)

Edition

6th

Pages

43-45

Doc link

http://www.aclweb.org/anthology/W17-2007

File

Download the digital copy of the doc pdf document

Abstract

Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space. We compare against existing approaches and show significant improvements in retrieval tasks on a largescale e-commerce dataset.

Categories

image classification, learning (artificial intelligence).

Author keywords

image-text embedding; deep learning; fashion

Scientific reference

A. Rubio, L. Yu, E. Simo-Serra and F. Moreno-Noguer. Multi-modal fashion product retrieval, 6th Workshop on Vision and Language, 2017, Valencia, pp. 43-45.