Clothing part dataset

The dataset consists of images and depth point clouds of everyday life clothing items laying on a table, with semantically meaningful parts, such as collars or sleeves, manually annotated with polygons. Data is presented as 640x480 PNG images and PCD v.7 plain text point cloud files. Data has been acquired using a Kinect 3D camera, and thus, color and depth images are registered. Since the dataset has not been acquired over a simplified, easy to segment, surface, foreground/background segmentation masks are provided. The masks only select the main clothing object in the scene, and are provided as binary images. This dataset is unique in that it includes hundreds of registered RGB and 3D scans of deformable clothing items with accurately annotated parts.

Dataset Characteristics

The dataset currently contains more than a thousand annotated parts in 627 scans, and we plan to increase it to 1000 scans for the final submission, specially with more test data. The current size of the dataset is of 5G, and 1G once compressed. We annotated instances of eleven clothing parts, in six types of garments. Most categories have images from two distinct objects (e.g. two different shirts). In the following figure, the different clothing items with their corresponding annotations (color coded) can be seen.

collar and sleeves
hips and hemline of the pant-legs
collar and sleeves

zoom Click images to enlarge

zoom Click images to enlarge

zoom Click images to enlarge
hood and sleeves
collar and sleeves

zoom Click images to enlarge

zoom Click images to enlarge

zoom Click images to enlarge

Annotation Criteria

Only what is clearly distinguishable for a human is annotated. In general, we annotated a bit more space to make sure all of the discriminative area is included in the polygon defined by a collection of points P = [X, Y]. Instances that are particularly difficult or have occluded areas are flagged as such. Occluded areas of the object are never annotated. If some part is seen from the back it is still annotated, but indicating this circumstance with the appropriate flag.

The choice of a polygon over a typical bounding box comes from the need to precisely define the area of interest. If a bounding box is still necessary (e.g. for training a Bag of Features sliding window detector), the equivalent bounding box can be obtained as: top-left (min(X), min(Y)) and bottom-right (max(X), max(Y)).

Annotation Criteria
1 Shirt_collar Around collar. Annotation goes down to approximately the very first button in the frontal opening.
2 Shirt_sleeves What is annotated are the cuffs. Annotation adjusted to boundaries of the cuff (leaving some small extra space to ensure all relevant area is inside).
3 T-shirt_collar Annotation drawn from the border to the slightly below the hemline of the collar (approximately double the space between the border and the hemline).
4 T-shirt_sleeves Similar criteria to those for the T-shirt_collar. Annotations are drawn around the hemline of the sleeve (leaving approximately double space in the inner part).
5 Jeans_hips Jeans hips are annotated completely covering the belt loop (and a tiny bit more). If present, the pocket hole and the zip hemlines should be included too.
6 Jeans_pants hemline The area between the bottom of the pant and slightly above the hemline (approximately two thirds of space between the bottom of the pant and hemline).
7 Polo_collar Around collar. Annotated down to approximately the very first button in the frontal opening.
8 Polo_sleeves Same criteria as for T-shirt_sleeves.
9 Sweater_hood Annotation starts at the beginning of the hood (no much extra space). If lace goes outside of the "hood area", it is ignored. Hood is annotated even if seen from the back.
10 Sweater_sleeves Same criteria as for Shirt_sleeves.
11 Dress_collar The top part of the dress, including the holes for the arms.

Each annotation includes a visibility flag which describes the difficulty of that particular instance. The visibility flag can take one of the following values:

Visibility Flags
Normal This is the default flag, if there are no major visibility problems.
Occluded This flag accounts for part instances that have more than 10% of their area occluded by another part of the textile or out of the visual field.
Hard This flag accounts for parts that are viewed from a non-canonical viewpoint (e.g. a shirt collar or jeans hips viewed from the back) or with a very high degree of deformation/self-occlusion.
Bold If only less than 10% of the part is visible, or if a part that would have been marked Hard is also occluded.
Disabled Clothing parts that have been annotated for the sake of completeness, but that do not belong to any training or testing set.

Test Images and Point Clouds

The dataset includes also a test set, acquired in the same conditions. The clothing items are presented in a wide variety of arrangements, from well folded to very wrinkled, and we also have acquired images with multiple clothing items, and annotated all of the relevant visible parts.


zoom Click images to enlarge

zoom Click images to enlarge

zoom Click images to enlarge



If you use this dataset please cite:

  1. A. Ramisa, G. Alenyà, F. Moreno-Noguer and C. Torras; A 3D descriptor to detect task-oriented grasping points in clothing, Pattern Recognition, Accepted for publication. doi:10.1016/j.patcog.2016.07.003
  2. A. Ramisa, G. Alenyà, F. Moreno-Noguer and C. Torras; Learning RGB-D descriptors of garment parts for informed robot grasping, Engineering Applications of Artificial Intelligence, 35: 246-258, 2014
  3. A. Ramisa, G. Alenyà, F. Moreno-Noguer and C. Torras; Using Depth and Appearance Features for Informed Robot Grasping of Highly Wrinkled Clothes, In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1703--1708, 2012


This work has been partially supported by the JAE-DOC grant from the CSIC and the FSE.