Master Thesis

Self-supervised learning for action segmentation using a Transformer architecture

Work default illustration

Student/s

Supervisor/s

Information

  • Started: 01/04/2023
  • Finished: 21/09/2023

Description

The focus of this project is to address the problem of Temporal Action Segmentation (TAS), which consist in temporally segment and classify fine-grained actions in untrimmed videos. The enhancement of this procedure represents a significant albeit intricate challenge. Some of the main challenges for this problem are that different actions can occur with different speed or duration, also some of them can be ambiguous and overlap. Successfully addressing this challenge can yield substantial advancements in various domains of work, including robotics, medical support technologies, surveillance and many more. Currently, the best performing state-of-the-art methods are fully-supervised. Consequently, they require huge annotation cost, are not scalable and not suited for applications where data collection is costly. To alleviate this problem, we propose a self-supervised transformer-based method for action segmentation, that does not require action labels, and demonstrate the effectiveness of the learned weights in a weakly-supervised setting. Precisely we built a Siamese architecture based on an improvement version of an already existing Transformer architecture. To validate our approach, we performed an ablation study and compared our results with the state-of-the-art to draw some conclusion.

Link

The work is under the scope of the following projects:

  • GREAT: Beyond Graph Neural Networks: Joint graph topology learning and graph-based inference for computer vision (web)