Publication
Learning priors of human motion with vision transformers
Conference Article
Conference
IEEE International Conference on Computers, Software, and Applications (COMPSAC)
Edition
48th
Pages
382-389
Doc link
http://dx.doi.org/10.1109/COMPSAC61105.2024.00060
File
Authors
-
Falqueto, Placido
-
Sanfeliu Cortés, Alberto
-
Palopoli, Luigi
-
Fontanelli, Daniele
Projects associated
Abstract
A clear understanding of where humans move in a scenario, their usual paths and speeds, and where they stop, is very important for different applications, such as mobility studies in urban areas or robot navigation tasks within human- populated environments. We propose in this article, a neural architecture based on Vision Transformers (ViTs) to provide this information. This solution can arguably capture spatial correlations more effectively than Convolutional Neural Networks (CNNs). In the paper, we describe the methodology and proposed neural architecture and show the experiments’ results with a standard dataset. We show that the proposed ViT architecture improves the metrics compared to a method based on a CNN.
Categories
automation.
Author keywords
vision transformers, human motion prediction, semantic scene understanding, masked autoencoders, occupancy priors
Scientific reference
P. Falqueto, A. Sanfeliu, L. Palopoli and D. Fontanelli. Learning priors of human motion with vision transformers, 48th IEEE International Conference on Computers, Software, and Applications , 2024, Osaka, Japan, pp. 382-389.
Follow us!