MV-RED Object Dataset


Introduction

In our track, we will release a real world object dataset with multi-view and multimodal information, named Multi-view RGB-D Object Dataset (MV-RED), which was recorded by the Multimedia Institute from Tianjin University. The MV-RED dataset consists of 505 objects, which are selected apple, cap, scarf, cup, mushroom, toy and so on. For each object, both RGB and depth information were recorded. Some example views are provided in Fig.1. The dataset will be made publicly available so as to enable rapid progress based on this promising technology.

Fig 1. RGB image and depth image of banana object.


Dataset

This dataset is recorded under two different recording settings.

(1) 202 objects were recorded in the first setting, which was completed in March 2014. It was recorded with three Kinect sensors (the 1st generation) mounted as shown in Fig.2. Camera 1 and Camera 2 captured 360 RGB and depth images respectively by uniformly rotating the table controlled by the step motor. Camera 3 only captured one RGB image and one depth image in the top-down view. In this way, each object has 721 RGB images and 721 depth images in total. The resolution of RGB/depth image is 640 × 480. Moreover, we also uniformly sampled the images from Camera 1 and 2 with the step of 10 degrees and provide the compact version with 73 images for individual RGB and depth data, that was 36 images from Camera 1, 36 images from Camera 2, and 1 image from Camera 3.

Fig 2. Recording environment of first version of data

(2) 303 objects were recorded under the second setting, which was completed in November 2014. These objects belong to 58 categories. It was recorded with three Kinect sensors (the 1st generation) mounted as shown in Fig.3. Camera 1 and Camera 2 captured 360 RGB and depth images respectively by uniformly rotating the table controlled by the step motor. Camera 3 only captured one RGB image and one depth image in the top-down view. In this way, each object has 721 RGB images and 721 depth images in all. The resolution of RGB/depth image is 640 × 480. Moreover, we also uniformly sampled the images from Camera 1 and 2 with the step of 10 degrees and provide the compact version with 73 images for individual RGB and depth data (36 from Camera 1, 36 from Camera 2, 1 from Camera 3).

The difference between these two settings lies in the directions for view acquisition, which increases the difficulties in view matching.

We implemented the foreground segmentation for RGB images of both versions. The mask of each object will be provided for this contest like Fig.4. To sum up, the MV-RED dataset contains two versions of data, which has been summarized in Table 1.



Versions Complete Version Concise Version
# of RGD Images 721 73
# of Depth Images 721 73

Table 1. The image content of the two versions of the dataset


Fig 3. Recording environment of the second version of dataset

Fig 4. The results of foreground segmentation



Download MV-RED Object Dataset


People

Weizhi Nie Anan Liu Yuting Su Xixi Li
Qun Cao Zhongyang Wang Xiaorong Zhu Ning Xu
Fan yu Yang Li Xiaoxue Li Yaoyao Liu
Fuwu Li Yang Shi Yahui Hao Zhengyu Zhao


Publication

Please note that the data we provide must be used for research purposes only. If you use our data, please cite the following paper.


Acknowledgement

National Natural Science Foundation of China (61472275,61100124);
Tianjin Research Program of Application Foundation and Advanced Technology;
Grant of Elite Scholar Program of Tianjin University.


Last update: 23/12/2014