IRI - Seeing and hearing egocentric actions: How much can we learn?

Publication

Seeing and hearing egocentric actions: How much can we learn?

Conference Article

Conference

ICCV Workshop on Egocentric Perception, Interaction and Computing (EPIC)

Edition

2019

Pages

4470-4480

Doc link

https://doi.org/10.1109/ICCVW.2019.00548

File

Download the digital copy of the doc pdf document

Authors

Cartas, Alejandro
Luque, Jordi
Radeva, Petia
Segura, Carlos
Dimiccoli, Mariella

Abstract

Our interaction with the world is an inherently multi-modal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial,and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a5.18%improvement over the state of the art on verb classification

Scientific reference

A. Cartas, J. Luque, P. Radeva, C. Segura and M. Dimiccoli. Seeing and hearing egocentric actions: How much can we learn?, 2019 ICCV Workshop on Egocentric Perception, Interaction and Computing, 2019, Seoul, South Corea, pp. 4470-4480.

Publication

Seeing and hearing egocentric actions: How much can we learn?

Conference Article

Conference

Edition

Pages

Doc link

File

Authors

Cartas, Alejandro

Luque, Jordi

Radeva, Petia

Segura, Carlos

Dimiccoli, Mariella

Abstract

Categories

Scientific reference