Simultaneous recognition and temporal localization of object interactions from videos.

Work default illustration



  • If you are interested in the proposal, please contact with the supervisors.


Most of existing algorithms for object interaction recognition take as input a single video shot, that is a sequence of temporally adjacent frames corresponding to an action, and output an action label, i.e. “taking a bottle”. However, this approach assumes that the video has been previously temporally segmented into temporal intervals corresponding to different actions. This assumption is unrealistic since manual temporal segmentation requires a considerable annotation effort and would be unfeasible for real world applications.
The purpose of this work is to develop a novel Deep Learning based algorithm for simultaneously recognizing and localizing in time object manipulations, such as “opening a fridge” or “taking a kettle”, from long video data captured by a wearable camera, i.e. GoPro.

Student profile
Bachelor or master’s student in Industrial, Telecommunication, or Computer Science Engineering. Specific knowledge or interest in Computer Vision and Machine Learning is appreciated. Excellent programming skills in python are required.