Active garment recognition and target grasping point detection using deep learning

Enric Corona, Guillem Alenyà, Toni Gabàs and Carme Torras

Abstract: Identification and bi-manual handling of deformable objects, like textiles, is one of the most challenging tasks in the field of industrial and service robotics. Their unpredictable shape and pose makes it very difficult to identify the type of garment and locate the most relevant parts that can be used for grasping. In this paper, we propose an algorithm that first, identifies the type of garment and second, performs a search of the two grasping points that allow a robot to bring the garment to a known pose. We show that using an active search strategy it is possible to grasp a garment directly from predefined grasping points, as opposed to the usual approach based on multiple re-graspings of the lowest hanging parts. Our approach uses a hierarchy of three Convolutional Neural Networks (CNNs) with different levels of specialization, trained both with synthetic and real images. The results obtained in the three steps (recognition, first grasping point, second grasping point) are promising. Experiments with real robots show that most of the errors are due to unsuccessful grasps and not to the localization of the grasping points, thus a more robust grasping strategy is required.

System overview

We present a pipeline based on Convolutional Neural Networks (CNN) to carry out garment identification and bring them to a known configuration. A piece of cloth is in a known configuration when it is grasped from two predefined reference points, in order to perform a task such as folding the garment or dressing a person. The process has the following steps:

  1. Garment classification: A first CNN, trained on real and simulated data, classifies a hanging garment grasped by a robot.
  2. First grasping point: A second CNN, trained on simulated data, predicts if the two reference points are visible and their location. If any of the points is visible, the robot will grasp it. If not, the garment will be turned until a point is visible.
  3. Second grasping point: Having grasped the garment from the first reference point a third CNN, trained also on simulated data, predicts the visibility and position of the second reference point.

The pipeline needs a classifier CNN and two more CNNs per garment, appart from the towel, whose vertexes can be found grasping the lowest point on the image when they are hanging. Then, the reference points for each garment are indicated and the whole process can be automated, since it is trained on simulated images.

Demonstration: Bringing garments to a known configuration

We use the example of a pair of jeans to illustrate the whole process in the following video, and evaluate the performance in the manipulation process. The jeans are initially grasped from a random point in the left column of the video. The ground truth, predefined in the waist, and the predictions are shown in white and green points, respectively. Observe that, in this step the garment can be in an infinite range of positions. If no points are predicted, the robot rotates the garment until at least one point is visible. Then, the middle column shows the pair of jeans being grasped using the point whose coordinates are more accurately localized in the point cloud. A last CNN then predicts the second grasping point, which is then grasped and shown in the third column.

We performed some experiments of the whole process of bringing a grasped real garment to a known configuration. Our setup includes two Barret's WAM robot arms and a Xtion camera. The predictions are not as accurate as in simulation but, still, the process leads to a similar pose to the reference one, for each garment. Regarding the robot execution, we have observed that the grasping action is a critical aspect. Most of the failures were caused by defective graspings, mainly because the robot gripper sometimes collides with the garment in the approach trajectory changing the grasping point position. We think that a more elaborated grasping strategy will help, for example using a specific grasping orientation for each point. This orientation could be either predicted with the CNN or computed from the garment pointcloud. Moreover, our gripper is generic and a specialized gripper for garment manipulation may help.


  1. E. Corona, G. Alenyà, T. Gabas, and C. Torras, “Active garment recognition and target grasping point detection using deep learning,” Pattern Recognition, vol. 74, pp. 629–641, 2018.