Learning the semantics of object-action relations by observation

Journal Article (2011)


The International Journal of Robotics Research







  • Aksoy, Eren Erdal

  • Abramov, Alexey

  • Dörr, Johannes

  • Ning, Kejun

  • Dellen, Babette

  • Wörgötter, Florentin

Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the {\it semantic event chain (SEC)}. We demonstrate that these time points are highly descriptive for distinguishing different manipulations. Employing simple sub-string search algorithms, semantic event chains can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.


E.E. Aksoy, A. Abramov, J. Dörr, K. Ning, B. Dellen and F. Wörgötter. Learning the semantics of object-action relations by observation. The International Journal of Robotics Research, 30(10): 1229-1249, 2011.