IRI - 3M-Transformer: A multi-stage multi-stream multimodal transformer for embodied turn-taking prediction

Publication

3M-Transformer: A multi-stage multi-stream multimodal transformer for embodied turn-taking prediction

Conference Article

Conference

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Edition

2024

Pages

8050-8054

Doc link

https://doi.org/10.1109/ICASSP48485.2024.10448136

File

Download the digital copy of the doc pdf document

Authors

Projects associated

GREAT: Beyond Graph Neural Networks: Joint graph topology learning and graph-based inference for computer vision

Abstract

Predicting turn-taking in multiparty conversations has many practical applications in human-computer/robot interaction. However, the complexity of human communication makes it a challenging task. Recent advances have shown that synchronous multi-perspective egocentric data can significantly improve turn-taking prediction compared to asynchronous, single-perspective transcriptions. Building on this research, we propose a new multimodal transformer-based architecture for predicting turn-taking in embodied, synchronized multi-perspective data. Our experimental results on the recently introduced EgoCom dataset show a substantial performance improvement of up to 14.01% on average compared to existing baselines and alternative transformer-based approaches.

Author keywords

cross-modal transformer, turn-taking prediction, embodied multi-perspective data, audio-video-text

Scientific reference

M. Fatan, E. Mincato, D. Pintzou and M. Dimiccoli. 3M-Transformer: A multi-stage multi-stream multimodal transformer for embodied turn-taking prediction, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024, Seoul, Korea, pp. 8050-8054.

Publication

3M-Transformer: A multi-stage multi-stream multimodal transformer for embodied turn-taking prediction

Conference Article

Conference

Edition

Pages

Doc link

File

Authors

Fatan Serj, Mehdi

Mincato, Emanuele

Pintzou, Dimitra

Dimiccoli, Mariella

Projects associated

GREAT: Beyond Graph Neural Networks: Joint graph topology learning and graph-based inference for computer vision

Abstract

Categories

Author keywords

Scientific reference