Multi-View TJU Dataset


Human action analysis has been an active research topic in recent year and many real-world human action recognition tasks involve data that can be factorized into multiple views. In order to promote the related research, we contribute a public dataset with 7040 action sequence in two views. To the best of our knowledge, this is the largest multiple-view video database with 22 human actions taken over 2 different scenarios.


1) The dataset has two views and the angle of the two views is about 65 degrees.

2) The most popular multiple-view human action dataset is IXMAS Dataset, but it does not contain depth data and 3D skeleton data.

Data Description

The Dataset was captured in two views (the front view and the side view) and contains 22 actions per view. Each action was performed four times by 20 subjects (10 males and 10 females) in both light and dark environments. Totally there are 7040 (22×20×4×2×2) action samples. Two Kinect cameras were placed in two different views to record each action performance with RGB images, depth data and 3D skeleton data in size of 640×480. To the best of our knowledge, this is the largest multiple-view video database with sequences of human actions taken over different scenarios.

All the action samples are saved in the "corresponding action" folders and divided into "Depth_Image", "RGB_Image", "Txt_Skel" and "Txt_Skel2D" portions, including "View1" and "View2" sets. "Txt_Skel" denotes the skeleton data (3D coordinates of 20 joints per frame) and "Txt_Skel2D" denotes the panel skeleton data (2D coordinates of 20 joints per frame). Action samples are named with the action index, the subject index, and the repeating index. For example, "a01_p01_t01" denotes the first action performed by the first person for the first time. Additionally, "a01_p01_t01" denotes the sample captured in the light environment and "N_a01_p01_t01" denotes the sample captured in the dark environment.

Totally, MV-TJU dataset contains the following information: RGB data (image sequence sample: 20.96G); Depth data (image sequence sample: 245.15G); 3D Skeleton data (259.78M); 2D Skeleton data (106.42M).

Action Categories

The dataset contains 22 action categories:
1.Boxing; 2.Side boxing; 3.One hand wave; 4.Two hands wave; 5.Hand clap; 6.Side bend; 7.Forward bend; 8.Draw X; 9.Draw tick; 10.Draw circle; 11.Tennis serve; 12.Tennis swing; 13.Walking; 14.Side walking; 15.Jogging; 16.Running; 17.Jacks; 18.Jump; 19.Jump in place; 20.Forward kick; 21.Side kick; 22.Sit down. (These actions are identical to the TJU Dataset)


We build one ftp server for downloading MV-TJU. When researchers send email to for registering, we will send username and password to them for downloading this dataset.

Note: The participants also need to download and file Agreement and Disclaimer Form and send it back to us with your register email.

The Recording Scene

RGB and Depth Images