Human Activity/Action Databases

1. RPI-ISL Activity Datasets

 

a. Parking Lot Dataset:

The dataset consists of 108 sequences for 7 actions captured from a parking lot. The actions includes walking, running, leaving car, entering car, bending down, throwing and looking around. These action examples are performed by two people with scale variation, view change and shadow interference.  

 

Action

# Examples

Walking

20

Running

16

Leaving Car

8

Entering Car

7

Bending Down

22

Throwing

21

Looking Around

14

 

b. Complex Activity Dataset:

The complex activity dataset consists of 15 video sequences for 5 complex activities in daily life: Shaking hands, Talking, Chasing, Boxing and Wrestling. These activities are conducted by two interactive subjects who perform 5 basic individual actions: Standing, Running, Making a fist, Clinching, and Reaching out. The following table describes the basic actions performed by the two subjects for each complex activity. In each sequence, the 5 complex activities are sequentially performed, so there are 15 examples for each complex activity.

 

Activity

Action of subject 1

Action of Subject 2

Shaking hands

Reaching out

Reaching out

Talking

Standing

Standing

Chasing

Running

Running

Boxing

Making a fist

Making a fist

Wrestling

Clinching

Clinching

*** The datasets are available upon request.

 

c. VIRAT-like Complex Activity Dataset:

 

Location: Erans blue external hard-drive and Kitwares \\videonas2 server

 

Details:

The Virat-like dataset was captured from the 4th floor stairwell of the parking garage overlooking the metered parking lot.  Several actors with three different vehicles performed scripted person-vehicle activities while also capturing background activities of pedestrians.  An HD camcorder with a 1920x1080 resolution was used to capture 3 hours of video for the following actions:

 

Actions:

-   Person getting into vehicle

-   Person getting out of vehicle

-   Person loading vehicle

-   Person unloading vehicle

-   Person opening trunk

-   Person closing trunk

-   Person walking

-   Person carrying object

-   Vehicles Parking

-   Vehicle picking person

-   Vehicle dropping off person

-   Person exchanging object from one vehicle to another

-   Person handing an object to another person

-   Group of people gathering

-   Group of people dispersing

-   Group of people moving

-    

Files:

-   The original video files are in MTS format; but were converted to mpeg for input to VIPER for annotations

-   The annotated files are in the xgtf (xml) format where only two of the eight video clips have been partially annotated

o  Rpi_dc1_4_7_10_00006.mpeg and ¡­_00007.mpeg are partially annotated

o  Only the starting bounding box and the temporal labels of each person-vehicle activity is labeled

o  There are some single person and group activities partially labeled; but that wasn¡¯t the focus at the time

 

Clip Example:

 

dc00013_fr112670000.png

¡¡

2. Weizmann Dataset 

    90 low-resolution(180x144) video clips with 9 different subjects, each of which performs 10 basic actions.

 

 

3. KTH Dataset       

   6 human actions performed several times by 25 subjects in four different scenarios. It totally contains 2391 sequences.

 

 

4. UCF Datasets

 

Location: http://server.cs.ucf.edu/~vision/

 

Details:

a.      Sports action dataset:  

 200 sequences for 9 sports activities at a resolution of 720x480.

This dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery, and GettyImages.

This new dataset contains close to 200 video sequences at a resolution of 720x480. The collection represents a natural pool of actions featured in a wide range of scenes and viewpoints. By releasing the dataset we hope to encourage further research into this class of action recognition in unconstrained environments.

 

Actions:

Diving (16 videos)
Golf swinging (25 videos)
Kicking (25 videos)
Lifting (15 videos)
Horseback riding (14 videos)
Running (15 videos)
Skating (15 videos)
Swinging (35 videos)
Walking (22 videos)

 

Acknowledgements:

If you use this data set, please refer to paper:

Mikel D. Rodriguez, Javed Ahmed, and Mubarak Shah Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition.

 

 

b. Aerial action dataset:  

This dataset features video sequences that were obtained using a R/C-controlled blimp equipped with an HD camera mounted on a gimbal.The collection represents a diverse pool of actions featured at different heights and aerial viewpoints. Multiple instances of each action were recorded at different flying altitudes which ranged from 400-450 feet and were performed by different actors.

 

Actions:

-Walking, Running, Digging, Picking up object, Kicking, Opening car door, Closing car door, opening car trunk, closing car trunk

 

Files:

-         Mpg for video, xgtf for annotations, and zip for balloon parameters

 

Clip Example:

 

 

c. YouTube action dataset:  11 action categories, each of which is grouped into 25 groups with more than 4 action clips in it.

 

All actions are annotated using the VIPER format

 

 

5. CAVIAR/BEHAVE Dataset 

 

Location:    http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

 

Details:

For the CAVIAR project a number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, entering and exitting shops, fighting and passing out and last, but not least, leaving a package in a public place.

 

The first section of video clips were filmed for the CAVIAR project with a wide angle camera lens in the entrance lobby of the INRIA Labs at Grenoble, France. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.

A typical frame from the image sequences is below. It shows three individual boxes (yellow) and one group box (green). There are several people in the video sequence that are not boxed because they do not move over the course of the sequence

 

Actions:

-        Walking along, meeting others, window shopping, entering shops, fighting, passing out, leaving package

 

Files:

-        MPEG2 or separate jpegs

-       The ground truth for these sequences was found by hand-labeling the images, as in the example shown above. The JAVA programs for the interactive labeller can be found here (unsupported). This is the userguide.

 

Acknowledgements:

-       If you publish results using the data, please acknowledge the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at URL: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.

 

Clip Example:

 

 

Six basic scenarios acted out by the CAVIAR team members: Walking, Browsing, Resting, Leaving bags behind, People meeting/walking together/splitting up and Two people fitting. There are about 3 ¨C 6 clips for each scenario.

 

6. HOHA Datasets

 

a. Hollywood Human Actions dataset:  contains 8 classes of human actions from 32 Hollywood movies: AnswerPhone, GetOutCar, HandShake, HugPerson, Kiss, SitDown, SitUp, StandUp.

 

b. Hollywood-2 Human Actions and Scenes dataset:  extended from last dataset. It contains 12 classes of human actions and 10 classes of scenes. There are over 3669 video sequences and approximately 20.1 hours of video in total.

 

7. LSCOM Event/Activity Dataset

    Events/activities labeled with 24 LSCOM concepts from TRECVID 2005 benchmark. Each event has more than 60 samples.

 

8. UMN Dataset

   An unusual crowd activity dataset consisting of 11 different scenarios of escape events in 3 different scenes.

 

9. UIUC Pair-activity Dataset

   This dataset consists of five classes of pair-activities: chasing, following, together, meeting, and independent. There are 131 to 203 video clips in each class, and 867 video clips in total.

 

10.CANTATA Project

 

Location:    http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html

 

Details:  This site contains details and links to all of the PETs databases as well as ¡°Behave,¡± and ¡°Traffic¡± databases

 

Actions: pedestrian and vehicle activities; but focus is on tracking

 

Files: Some have annotations

 

Acknowledgements: See individual datasets

 

Clip Example:

 

 

 

11. Traffic Intersection Video

 

Location:    http://i21www.ira.uka.de/image_sequences/

 

Details:  This site contains a collection of video clips of traffic intersections in Berthold-StraBe in Karlsruhe captured with stationary cameras in grayscale and some in color

 

Actions: typical vehicle activities

 

Files: MPEG videos of different qualities and zip files

 

Acknowledgements: See individual datasets

 

Clip Example: Screen shot of some of the video clips

 

 

12. ICPR 2010 contest on Semantic Description of Human Activities

 

Location:    http://cvrc.ece.utexas.edu/SDHA2010/Wide_Area_Activity.html#Data

In order to obtain the Videoweb dataset, you must go to its website: http://vwdata.ee.ucr.edu/, and follow its protocol (requires you to sign a release form)

 

Details:  The Videoweb dataset consists of about 2.5 hours of video observed from 4-8 cameras. The data is divided into a number of scenes that were collected over many days. Each scene is observed by a camera network where the actual number of cameras changes by scene due the nature of the scene. For each scene, the videos from the cameras are available. Annotation is available for each scene and the annotation convention is described in the dataset. It identifies the frame numbers and camera ID for each activity that is annotated. The videos from the cameras are approximately synchronized. The videos contain several types of activities including throwing a ball, shaking hands, standing in a line, handing out forms, running, limping, getting into/out of a car, and cars making turns. The number for each activity varies widely.

 

Actions:

1.   High-Level Interaction Recognition: hand shaking, hugging, kicking, pointing, punching, and pushing

2.   Aerial view activity classification: pointing, standing, digging, walking, carrying, running, wave1, wave2, and jumping

3.   Wide area search and recognition: ¡­

 

Files: MPEG videos of with annotations

 

Acknowledgements: If you make use of the Videoweb dataset in any form, please cite the following reference.

@misc{UCR-Videoweb-Data,
      author = "C. Ding and A. Kamal and G. Denina and H. Nguyen and A. Ivers and B. Varda and C. Ravishankar and B. Bhanu and A. Roy-Chowdhury",
      title = "{V}ideoweb {A}ctivities {D}ataset, {ICPR} contest on {S}emantic {D}escription of {H}uman {A}ctivities ({SDHA})",
      year = "2010",
      howpublished = "http://cvrc.ece.utexas.edu/SDHA2010/Wide\_Area\_Activity.html"
}

 

Clip Example: Screen shot of some of the video clips