Human Activity/Action Databases
1.
RPI-ISL Activity Datasets
a. Parking Lot Dataset:
The dataset
consists of 108 sequences for 7 actions captured from a parking lot. The
actions includes walking, running, leaving car, entering car, bending down, throwing
and looking around. These action examples are performed by two people with
scale variation, view change and shadow interference.
|
Action |
#
Examples |
|
Walking |
20 |
|
Running |
16 |
|
Leaving Car |
8 |
|
Entering Car |
7 |
|
Bending Down |
22 |
|
Throwing |
21 |
|
Looking Around |
14 |
b. Complex Activity Dataset:
The complex
activity dataset consists of 15 video sequences for 5 complex activities in
daily life: Shaking hands, Talking, Chasing, Boxing and Wrestling. These
activities are conducted by two interactive subjects who perform 5 basic
individual actions: Standing, Running, Making a fist, Clinching, and Reaching
out. The following table describes the basic actions performed by the two
subjects for each complex activity. In each sequence, the 5 complex activities
are sequentially performed, so there are 15 examples for each complex activity.
|
Activity |
Action of subject 1 |
Action of Subject 2 |
|
Shaking hands |
Reaching out |
Reaching out |
|
Talking |
Standing |
Standing |
|
Chasing |
Running |
Running |
|
Boxing |
Making a fist |
Making a fist |
|
Wrestling |
Clinching |
Clinching |
***
The datasets are available upon request.
c. VIRAT-like Complex Activity Dataset:
Location: Erans blue external hard-drive and Kitwares \\videonas2 server
Details:
The Virat-like dataset was captured from the 4th
floor stairwell of the parking garage overlooking the metered parking lot. Several actors with three different
vehicles performed scripted person-vehicle activities while also capturing
background activities of pedestrians.
An HD camcorder with a 1920x1080 resolution was used to capture 3 hours
of video for the following actions:
Actions:
-
Person
getting into vehicle
-
Person
getting out of vehicle
-
Person
loading vehicle
-
Person
unloading vehicle
-
Person
opening trunk
-
Person
closing trunk
-
Person
walking
-
Person
carrying object
-
Vehicles
Parking
-
Vehicle
picking person
-
Vehicle
dropping off person
-
Person
exchanging object from one vehicle to another
-
Person
handing an object to another person
-
Group
of people gathering
-
Group
of people dispersing
-
Group
of people moving
-
Files:
-
The
original video files are in MTS format; but were converted to mpeg for input to
VIPER for annotations
-
The
annotated files are in the xgtf (xml) format where only two of the eight video
clips have been partially annotated
o Rpi_dc1_4_7_10_00006.mpeg and ¡_00007.mpeg
are partially annotated
o Only the starting bounding box and the
temporal labels of each person-vehicle activity is labeled
o There are some single person and group
activities partially labeled; but that wasn¡¯t the focus at the time
Clip Example:

¡¡
90 low-resolution(180x144) video clips with 9 different subjects, each of which
performs 10 basic actions.
3. KTH Dataset
6
human actions performed several times by 25 subjects in four different
scenarios. It totally contains 2391 sequences.
4. UCF Datasets
Location: http://server.cs.ucf.edu/~vision/
Details:
a.
Sports action dataset:
200 sequences for 9 sports activities at
a resolution of 720x480.
This dataset consists of a set
of actions collected from various sports which are typically featured on
broadcast television channels such as the BBC and ESPN. The video sequences
were obtained from a wide range of stock footage websites including BBC Motion
gallery, and GettyImages.
This new
dataset contains close to 200 video sequences at a resolution of 720x480. The
collection represents a natural pool of actions featured in a wide range of
scenes and viewpoints. By releasing the dataset we hope to encourage further
research into this class of action recognition in unconstrained environments.
Actions:
Diving (16 videos)
Golf
swinging (25 videos)
Kicking (25
videos)
Lifting (15
videos)
Horseback
riding (14 videos)
Running (15
videos)
Skating (15
videos)
Swinging
(35 videos)
Walking (22
videos)
Acknowledgements:
If you use this
data set, please refer to paper:
Mikel D. Rodriguez,
Javed Ahmed, and Mubarak Shah Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter
for Action Recognition.
b. Aerial action dataset:
This dataset
features video sequences that were obtained using a R/C-controlled blimp
equipped with an HD camera mounted on a gimbal.The collection represents a
diverse pool of actions featured at different heights and aerial viewpoints.
Multiple instances of each action were recorded at different flying altitudes
which ranged from 400-450 feet and were performed by different actors.
Actions:
-Walking,
Running, Digging, Picking up object, Kicking, Opening car door, Closing car
door, opening car trunk, closing car trunk
Files:
-
Mpg for video, xgtf for annotations, and zip for balloon parameters
Clip Example:

c. YouTube action dataset: 11
action categories, each of which is grouped into 25 groups with more than 4
action clips in it.
All actions are annotated using the
VIPER format
Location: http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/
Details:
For the
CAVIAR project a number of video clips were recorded acting out the different scenarios
of interest. These include people walking alone, meeting with others, window
shopping, entering and exitting shops, fighting and passing out and last, but
not least, leaving a package in a public place.
The first
section of video clips were filmed for the CAVIAR project with a wide
angle camera lens in the entrance lobby of the INRIA Labs at Grenoble, France.
The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per
second) and compressed using MPEG2. The file sizes are mostly between 6 and 12
MB, a few up to 21 MB.
A typical
frame from the image sequences is below. It shows three individual boxes
(yellow) and one group box (green). There are several people in the video
sequence that are not boxed because they do not move over the course of the
sequence
Actions:
-
Walking along, meeting others, window shopping, entering shops, fighting,
passing out, leaving package
Files:
-
MPEG2 or separate jpegs
-
The ground truth for these
sequences was found by hand-labeling the images, as in the example shown above.
The JAVA programs for the interactive labeller can be found here
(unsupported). This is the userguide.
Acknowledgements:
-
If you publish results using the data, please acknowledge
the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at
URL: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.
Clip Example:

Six basic
scenarios acted out by the CAVIAR team members: Walking, Browsing, Resting,
Leaving bags behind, People meeting/walking together/splitting up and Two
people fitting. There are about 3 ¨C 6 clips for each scenario.
a.
Hollywood Human Actions dataset:
contains 8 classes of human actions from 32 Hollywood movies: AnswerPhone,
GetOutCar, HandShake, HugPerson, Kiss, SitDown, SitUp, StandUp.
b. Hollywood-2 Human Actions and Scenes dataset:
extended from last dataset. It contains 12 classes of human actions and 10
classes of scenes. There are over 3669 video sequences and approximately 20.1
hours of video in total.
7. LSCOM
Event/Activity Dataset
Events/activities labeled with 24 LSCOM concepts from TRECVID 2005 benchmark.
Each event has more than 60 samples.
8. UMN
Dataset
An unusual crowd activity dataset consisting of 11 different scenarios of
escape events in 3 different scenes.
9. UIUC Pair-activity Dataset
This
dataset consists of five classes of pair-activities: chasing, following,
together, meeting, and independent. There are 131 to 203 video clips in each
class, and 867 video clips in total.
10.CANTATA Project
Location: http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html
Details:
This site contains details and links to all of the PETs databases as
well as ¡°Behave,¡± and ¡°Traffic¡± databases
Actions: pedestrian and vehicle activities; but focus
is on tracking
Files: Some have annotations
Acknowledgements: See individual datasets
Clip Example:

11. Traffic
Intersection Video
Location: http://i21www.ira.uka.de/image_sequences/
Details:
This site contains a collection of video clips of traffic intersections
in Berthold-StraBe in Karlsruhe captured with stationary cameras in grayscale
and some in color
Actions: typical vehicle activities
Files: MPEG videos of different qualities and zip
files
Acknowledgements: See individual datasets
Clip Example: Screen shot of some of the video clips

12. ICPR 2010
contest on Semantic Description of Human Activities
Location: http://cvrc.ece.utexas.edu/SDHA2010/Wide_Area_Activity.html#Data
In order to
obtain the Videoweb dataset, you must go to its website: http://vwdata.ee.ucr.edu/, and follow its
protocol (requires you to sign a release form)
Details: The
Videoweb dataset consists of about 2.5 hours of video observed from 4-8
cameras. The data is divided into a number of scenes that were collected over
many days. Each scene is observed by a camera network where the actual number
of cameras changes by scene due the nature of the scene. For each scene, the
videos from the cameras are available. Annotation is available for each scene
and the annotation convention is described in the dataset. It identifies the
frame numbers and camera ID for each activity that is annotated. The videos
from the cameras are approximately synchronized. The videos contain several
types of activities including throwing a ball, shaking hands, standing in a
line, handing out forms, running, limping, getting into/out of a car, and cars
making turns. The number for each activity varies widely.
Actions:
1.
High-Level
Interaction Recognition: hand shaking, hugging, kicking, pointing, punching,
and pushing
2.
Aerial
view activity classification: pointing, standing, digging, walking, carrying,
running, wave1, wave2, and jumping
3.
Wide
area search and recognition: ¡
Files: MPEG videos of with annotations
Acknowledgements: If you make
use of the Videoweb dataset in any form, please cite the following reference.
@misc{UCR-Videoweb-Data,
author = "C. Ding and A. Kamal and G. Denina and H.
Nguyen and A. Ivers and B. Varda and C. Ravishankar and B. Bhanu and A. Roy-Chowdhury",
title = "{V}ideoweb {A}ctivities {D}ataset, {ICPR}
contest on {S}emantic {D}escription of {H}uman {A}ctivities ({SDHA})",
year = "2010",
howpublished =
"http://cvrc.ece.utexas.edu/SDHA2010/Wide\_Area\_Activity.html"
}
Clip Example: Screen shot of some of the video clips
