Master Thesis

Self-supervised deep learning of multimodal representations

Work default illustration



  • If you are interested in the proposal, please contact with the supervisors.


Recently, supervised deep learning approaches have achieved impressive performance in a large variety of AI tasks. However, these methods rely on the availability of huge amount of labeled data. In several applications, labeled data may be scarcely available or they may be difficult to acquire,
e.g. they may require expert knowledge.
To cope with this problem, self-supervised approaches have emerged as a new deep learning paradigm allowing to train a model on a proxy-task with pseudo-labels that come from free from the data themselves, hence without requiring any manual annotation.

In this project, the student will develop a multimodal self-supervised approach, able to learn an embedded space for audio and visual video data with strong temporal reasoning capacity. The embedded representation will be validated on several tasks, including action retrieval and recognition.

Student profile:
The main requirement for this project is fluency in Python. A strong background and knowledge of deep learning will be a plus. For any questions, please contact Mariella Dimiccoli