Master Thesis

Unsupervised learning of audio-visual representations for action recognition

Work default illustration



  • If you are interested in the proposal, please contact with the supervisors.


Self-supervised methods learn features without human supervision, by training a model to solve a task derived from the input data itself. These methods are particularly attractive since they do not require time-consuming and expensive data annotations.

In this project, the student will develop a multimodal self-supervised method, able to learn an embedded space for audio and visual video data with strong temporal reasoning capacity. The embedded representation will be validated on several tasks, including action recognition.

Student profile:
The main requirement for this project is fluency in Python. A strong background and knowledge of deep learning will be a plus. For any questions, please contact Mariella Dimiccoli