Publication
Multi-modal embedding for main product detection in fashion
Conference Article
Conference
ICCV Workshop on Computer Vision for Fashion (CVF)
Edition
2017
Pages
2236-2242
Doc link
https://doi.org/10.1109/ICCVW.2017.261
File
Authors
Abstract
We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.
Categories
computer vision, learning (artificial intelligence).
Author keywords
common embedding, multi-modal embedding, deep learning
Scientific reference
A. Rubio, L. Yu, E. Simo-Serra and F. Moreno-Noguer. Multi-modal embedding for main product detection in fashion, 2017 ICCV Workshop on Computer Vision for Fashion, 2017, Venice, Italy, pp. 2236-2242.
Follow us!