Master Thesis

Enhancing egocentric action recognition by exploiting extra-shot information

Work default illustration



  • If you are interested in the proposal, please contact with the supervisors.


Wearable cameras such as the popular GoPro offer the opportunity to capture naturally-occurring activities from an egocentric perspective (i.e. from the subject’s own point of view). This egocentric paradigm is particularly useful for analyzing activities involving object manipulations since actions and objects tend to appear in the center of the image and object occlusion are naturally minimized.
State of the art algorithms for action recognition typically take as input a single shot (sequence of frames sharing the same action label) and output a vector of action probabilities. However, egocentric videos are made of a continuous sequence of shots corresponding to sequence of actions (i.e. put glass, take kettle, pour water, etc) aimed at achieving a given goal (i.e. preparing a tea). The goal of this project is to develop a new deep learning based approach allowing to estimate actions based not only on the current shot but also on contextual information provided by previous shots as well as on the set of possible target goals.