Learning semantic representations through multimodal graph neural networks

Work default illustration



  • If you are interested in the proposal, please contact with the supervisors.


Semantic knowledge provides foundation for many of our everyday nonverbal behaviors. To crack egg on bowl for example, one must recognize the egg and bowl, infer their unobserved qualities (egg is hard-shell and liquid inside, bowl is rigid, etc.) and deploy the appropriate praxis in service of the current goal. All tasks that require semantic knowledge about both the objects and the actions.

To provide robotic systems with the knowledge about objects required to interact with them, this project will address the problem of learning semantic object representations jointly from multiple modalities. The focus will be primarily on the language and vision modalities, that together ensure both semantic relatedness and perceptual grounding to the object representations.

The student is expected to work with Graph Convolutional Neural Networks that have proved to be successful in dealing with graph structured data.

Student profile

Artificial intelligence enthusiast. Previous experience with deep learning frameworks is a plus.