Master Thesis

Tracking and approaching persons using Deep Learning techniques

Work default illustration


  • Started: 08/11/2017
  • Finished: 17/09/2018


This project proposes a solution in order to enable a social robot to approach and follow one specific person using a vision based system. The idea is to introduce this features to the robot so that it will be able, int he futur, to autonomously interact with people. To do so, the algorithms developed in the project use Convolutional Neural Networks to identify where is the person given an image provided by the robot camera. In order to accomplish these tasks, two algorithms are created: an object detector using YOLO algorithm and an object tracker using a Siamese network. In order to fully understand how these algorithms work, the methods and architectures on which they are based are explained. Also, one of the algorithms is attempted to train with Google Colaboratory, even though the training results aren’t used in the final implementation. Different recordings are filmed teleoperating the robot simulating real approaching and following operations. Each recording is then labelled frame by frame to use this data in the training. Since the results of the training aren’t used in the final implementation, the recordings are used to test how well the final implementation is able to track the target in a frame. Therefore, different metrics are evaluated in the recordings, separating the ”approaching operation” videos from the ”following operation” videos, thus obtaining separated results for the two operations. Also, since the recordings where filmed considering different light conditions, it is possible to analyze how light variations affect the results when performing both approaching and following operations. These algorithms are run and tested in a Jetson TX2, using the GPU of the embedded device to enhance the performance of the algorithms. The implementation is build using PyTorch.