Adrià Colomé and Carme Torras
Robotic manipulation often requires adaptation to changing environments. Such changes can be represented by a certain number of contextual variables that may be observed or sensed in different manners. When learning and representing robot motion –usually with movement primitives–, it is desirable to adapt the learned behaviors to the current context. Moreover, different actions or motions can be considered in the same framework, using contextualization to decide which action applies to which situation. Such frameworks, however, may easily become large-dimensional, thus requiring to reduce the dimensionality of the parameters space, as well as the amount of data needed to generate and improve the model over experience.
In this paper, we propose an approach to obtain a generative model from a set of actions that share a common feature. Such feature, namely a contextual variable, is plugged into the model to generate motion. We encode the data with a Gaussian Mixture Model in the parameter space of Probabilistic Movement Primitives (ProMPs), after performing Dimensionality Reduction (DR) on such parameter space, in a similar fashion as in. We append the contextual variable to the parameter space and obtain the number of Gaussian components, i.e., different actions in a dataset, through Persistent Homology . Then, using multimodal Gaussian Mixture Regression (GMR) , we can retrieve the most likely actions given a contextual situation and execute them. After actions are executed, we use Reward-Weighted Responsibility GMM (RWR-GMM) update the model after each execution. Experimentation in 3 scenarios shows that the method drastically reduces the dimensionality of the parameter space, thus implementing both action selection and adaptation to a changing situation in an efficient way.
In the following video, we present one experiment in the paper: we scaled up the dimensionality of the problem with a feeding experiment . We placed a mannequin's head on a table, together with two plates, one with pieces of apple and another one with small balls simulating soup. With a Kinect camera reading the head and two plates' positions through QR codes attached to them, the aim of the experiment was to teach the robot how to feed the mannequin either of the two types of food (apple, soup).
We kinesthetically taught the robot 20 motions for feeding each food (40 demonstrations in total), changing the position of the head and plates in every demonstration. For the robot state, we considered the pose of the end-effector (6 -DoF), with 15 Gaussians per DoF, totalling 90 parameters, which were mapped onto a 27 -dimensional latent space through our proposed DR. Regarding contextual representation, we considered the x, y coordinates on the table of both plates and the head, plus a choice variable which was 1 for soup and 2 for apples. In total, dim() = 27 and dim(s) = 7 , totaling a state x = s of dim(x) = 34 . In the video, a good qualitative behavior of the resulting model when conditioning to a given context is shown.