Edinburgh office monitoring video dataset


This web page contains video data and ground truth for 20 days of monitoring a person in their office. Twelve days are from one office, with 2 or 3 additional days each in 3 other offices. All people observed in the videos gave their consent to be recorded. This data was recorded Feb 18 - April 29, 2016 in offices in the School of Informatics at The University of Edinburgh.

This dataset is low frame rate video of people doing their normal activities in an office setting. The data is acquired using a fixed camera as a set of 1280*720 pixel color images captured at an average of about 1 FPS. Typical frames from two offices are shown here, and two typical frames with no people. Mainly the images show one person working, or an empty office. However, occasionally there are several other people in the office for a meeting.

Office 1 empty frame Office 1 typical frame
Office 1 empty frameOffice 1 typical frame
Office 4 empty frame Office 4 typical frame
Office 2 empty frameOffice 2 typical frame

This dataset is interesting because there are about 450K labeled frames of people doing standard office activities. This allows analysis of normal activities, and thus monitoring for interesting events, such as a person falling down, or being in a position without moving for a long period time (e.g. unconscious). Other interesting aspects of the dataset are:

  1. Four different participants in 4 different settings
  2. Most frames have either 0 people, or a single person in largely the same position all the time, with a maximum of 3 people in a frame.
  3. The video is taken over a significant period of time, so the lighting can change considerably, and there are times when the external lighting entering the scene is quite strong.
  4. There is colored lighting of varying hues contributing to the lighting in the first 12 days.

We attempted to mark the position of each person in each image with a bounding box and a behavior. Given the number of images, not all marks are guarenteed to be correct. If errors are found, please let us know: (day,frame,person,bounding box upper left position, bounding box height, width, behavior). Thanks.

The groundtruth for each individual day video is saved in a mat file with the name of that day. The ground truth file dayN.mat contains a structure 'labels' (N=01...20). NumFrames = length(labels) gives the number of frames recorded for day N. The order of the bounding boxes is the same as the order of the image frames. 'labels{x}' is a 3x5 matrix for video frame x. There is one row for each person in the room (at maximum, there are three people present in a room). In each row in the 3x5 matrix, the first four values contain information about a bounding box around the person while the 5th value contains the behavior label. The first two values in a row are the column and row coordinates of the pixel at the top left of a bounding box around the person. The 3rd and 4th values in a row contain the width and height of the bounding box. If there is no person (or second/third person) present, then all values on that row are 0.

The behavior code labels are:

       0 --- Room is empty (the position values are also 0)
       1 --- Person is standing/walking
       2 --- Person is sitting
       3 --- Two or three people are talking to each other
       4 --- Person in room has fallen

There are 20 days of video that can be downloaded from the table below, which contains the individual frames (TAR). There is also a AVI made from the frames (AVI). A ground truth file (GT) containing, for each frame, a record giving a bounding box for each main participant and their activity (one of 4 activity states listed above). A final file (FRAME NAMES) gives a list of the image frame file names that correspond with the ground truth.

Data summary: There are in total 456714 frames, 134110 with no one in the room, 249956 with 1 person, 63013 with 2 people, and 9635 with 3 people. There are 337 frames with a fallen person (days 6 and 11).

IDDateFramesAVIAVI SIZETARTAR SIZEGTFRAME NAMES
12016_02_1810047day_1.avi1.2GBday_1.tar1.2GBday01.matName01.mat
22016_02_19 6719day_2.avi0.7GBday_2.tar0.7GBday02.matName02.mat
32016_02_2227235day_3.avi3.1GBday_3.tar3.0GBday03.matName03.mat
42016_02_2313082day_4.avi1.6GBday_4.tar1.6GBday04.matName04.mat
52016_02_2421793day_5.avi2.4GBday_5.tar2.4GBday05.matName05.mat
62016_02_2512652day_6.avi1.5GBday_6.tar1.5GBday06.matName06.mat
72016_02_2631706day_7.avi3.7GBday_7.tar3.7GBday07.matName07.mat
82016_02_2929002day_8.avi3.4GBday_8.tar3.3GBday08.matName08.mat
92016_03_0116250day_9.avi1.6GBday_9.tar1.6GBday09.matName09.mat
102016_03_0216253day_10.avi1.6GBday_10.tar1.6GBday10.matName10.mat
112016_03_0329568day_11.avi3.2GBday_11.tar3.1GBday11.matName11.mat
122016_03_0422344day_12.avi2.4GBday_12.tar2.3GBday12.matName12.mat
132016_04_1325717day_13.avi2.6GBday_13.tar2.5GBday13.matName13.mat
142016_04_1429004day_14.avi2.2GBday_14.tar2.2GBday14.matName14.mat
152016_04_2023300day_15.avi1.9GBday_15.tar1.9GBday15.matName15.mat
162016_04_2131706day_16.avi2.7GBday_16.tar2.7GBday16.matName16.mat
172016_04_2222622day_17.avi1.9GBday_17.tar1.9GBday17.matName17.mat
182016_04_2728016day_18.avi2.8GBday_18.tar2.8GBday18.matName18.mat
192016_04_2827993day_19.avi2.9GBday_19.tar2.8GBday19.matName19.mat
202016_04_2931706day_20.avi3.2GBday_20.tar3.2GBday20.matName20.mat

Here is example Matlab code that shows how to access the data and draw a bounding box around the person.

Accuracy of the data

With up to 3 people in the scene and 456715 frames across the 20 days, there are undoubtedly some position and behavior labeling errors. We developed some automatic consistency checks and fixed identified errors. After this, to assess the level of error, we chose 100 frames randomly from each of the 20 days. No errors were found. A position was considered correct if the bounding box contained most of a person. A behavior was considered correct if that behavior was occuring in that frame.

The bounding box should intersect with a major portion of the person's body. They were largely automatically found.

Additional useful data and paper to cite

Acknowledgments

Thanks to Paul Anderson, Robert Fisher, Jane Hillston, and Kami Vaniea for agreeing to have a video camera in their office. Also thanks to various students who agreed to be videoed during the recording days. The ground truth was prepared by Kuangzheng Ye, Peter Stefanov, Nanbo Li and Tehreem Qasim, who also developed the correctness checking code (and made many corrections). Ye did initial investigations of using a fully connected HMM to recognise the current states of the main participant in the videos.

Use of the low resolution videos, images, and ground truth: Any use of the video data is under the Attribution-NonCommercial-ShareAlike (aka CC BY-NC-SA) CC BY-NC-SA license. Public use of the videos should include this acknowledgment: "We thank the University of Edinburgh for the use of the low resolution video and ground truth data." This paper should be cited: T. Qasim, R. B. Fisher, N. Bhatti; Ground-truthing Large Human Behavior Monitoring Datasets, Proc. 2020 Int. Conf on Pattern Recognition, online, 2021.

Contact

Email: Robert Fisher at rbf -a-t- inf.ed.ac.uk.

School of Informatics, Univ. of Edinburgh
1.26 Informatics Forum, 10 Crichton St
Edinburgh EH8 9AB, UK
Tel: +44-(131)-651-3441 (direct line), +44-(131)-651-3443 (secretary)
Fax: +44-(131)-650-6899

Valid HTML 4.0!

******