Edinburgh office monitoring video dataset

This web page contains video data and ground truth for 20 days of monitoring a person in their office. Twelve days are from one office, with 2 or 3 additional days each in 3 other offices. All people observed in the videos gave their consent to be recorded. This data was recorded Feb 18 - April 29, 2016 in offices in the School of Informatics at The University of Edinburgh.

This dataset is low frame rate video of people doing their normal activities in an office setting. The data is acquired using a fixed camera as a set of 1280*720 pixel color images captured at an average of about 1 FPS. Typical frames from two offices are shown here, and two typical frames with no people. Mainly the images show one person working, or an empty office. However, occasionally there are several other people in the office for a meeting.


Office 1 empty frame	Office 1 typical frame

Office 2 empty frame	Office 2 typical frame

This dataset is interesting because there are about 450K labeled frames of people doing standard office activities. This allows analysis of normal activities, and thus monitoring for interesting events, such as a person falling down, or being in a position without moving for a long period time (e.g. unconscious). Other interesting aspects of the dataset are:

Four different participants in 4 different settings
Most frames have either 0 people, or a single person in largely the same position all the time, with a maximum of 3 people in a frame.
The video is taken over a significant period of time, so the lighting can change considerably, and there are times when the external lighting entering the scene is quite strong.
There is colored lighting of varying hues contributing to the lighting in the first 12 days.

We attempted to mark the position of each person in each image with a bounding box and a behavior. Given the number of images, not all marks are guarenteed to be correct. If errors are found, please let us know: (day,frame,person,bounding box upper left position, bounding box height, width, behavior). Thanks.

The groundtruth for each individual day video is saved in a mat file with the name of that day. The ground truth file dayN.mat contains a structure 'labels' (N=01...20). NumFrames = length(labels) gives the number of frames recorded for day N. The order of the bounding boxes is the same as the order of the image frames. 'labels{x}' is a 3x5 matrix for video frame x. There is one row for each person in the room (at maximum, there are three people present in a room). In each row in the 3x5 matrix, the first four values contain information about a bounding box around the person while the 5th value contains the behavior label. The first two values in a row are the column and row coordinates of the pixel at the top left of a bounding box around the person. The 3rd and 4th values in a row contain the width and height of the bounding box. If there is no person (or second/third person) present, then all values on that row are 0.

The behavior code labels are:

       0 --- Room is empty (the position values are also 0)
       1 --- Person is standing/walking
       2 --- Person is sitting
       3 --- Two or three people are talking to each other
       4 --- Person in room has fallen

There are 20 days of video that can be downloaded from the table below, which contains the individual frames (TAR). There is also a AVI made from the frames (AVI). A ground truth file (GT) containing, for each frame, a record giving a bounding box for each main participant and their activity (one of 4 activity states listed above). A final file (FRAME NAMES) gives a list of the image frame file names that correspond with the ground truth.

Data summary: There are in total 456714 frames, 134110 with no one in the room, 249956 with 1 person, 63013 with 2 people, and 9635 with 3 people. There are 337 frames with a fallen person (days 6 and 11).

ID	Date	Frames	AVI	AVI SIZE	TAR	TAR SIZE	GT	FRAME NAMES
1	2016_02_18	10047	day_1.avi	1.2GB	day_1.tar	1.2GB	day01.mat	Name01.mat
2	2016_02_19	6719	day_2.avi	0.7GB	day_2.tar	0.7GB	day02.mat	Name02.mat
3	2016_02_22	27235	day_3.avi	3.1GB	day_3.tar	3.0GB	day03.mat	Name03.mat
4	2016_02_23	13082	day_4.avi	1.6GB	day_4.tar	1.6GB	day04.mat	Name04.mat
5	2016_02_24	21793	day_5.avi	2.4GB	day_5.tar	2.4GB	day05.mat	Name05.mat
6	2016_02_25	12652	day_6.avi	1.5GB	day_6.tar	1.5GB	day06.mat	Name06.mat
7	2016_02_26	31706	day_7.avi	3.7GB	day_7.tar	3.7GB	day07.mat	Name07.mat
8	2016_02_29	29002	day_8.avi	3.4GB	day_8.tar	3.3GB	day08.mat	Name08.mat
9	2016_03_01	16250	day_9.avi	1.6GB	day_9.tar	1.6GB	day09.mat	Name09.mat
10	2016_03_02	16253	day_10.avi	1.6GB	day_10.tar	1.6GB	day10.mat	Name10.mat
11	2016_03_03	29568	day_11.avi	3.2GB	day_11.tar	3.1GB	day11.mat	Name11.mat
12	2016_03_04	22344	day_12.avi	2.4GB	day_12.tar	2.3GB	day12.mat	Name12.mat
13	2016_04_13	25717	day_13.avi	2.6GB	day_13.tar	2.5GB	day13.mat	Name13.mat
14	2016_04_14	29004	day_14.avi	2.2GB	day_14.tar	2.2GB	day14.mat	Name14.mat
15	2016_04_20	23300	day_15.avi	1.9GB	day_15.tar	1.9GB	day15.mat	Name15.mat
16	2016_04_21	31706	day_16.avi	2.7GB	day_16.tar	2.7GB	day16.mat	Name16.mat
17	2016_04_22	22622	day_17.avi	1.9GB	day_17.tar	1.9GB	day17.mat	Name17.mat
18	2016_04_27	28016	day_18.avi	2.8GB	day_18.tar	2.8GB	day18.mat	Name18.mat
19	2016_04_28	27993	day_19.avi	2.9GB	day_19.tar	2.8GB	day19.mat	Name19.mat
20	2016_04_29	31706	day_20.avi	3.2GB	day_20.tar	3.2GB	day20.mat	Name20.mat

Here is example Matlab code that shows how to access the data and draw a bounding box around the person.

Accuracy of the data

With up to 3 people in the scene and 456715 frames across the 20 days, there are undoubtedly some position and behavior labeling errors. We developed some automatic consistency checks and fixed identified errors. After this, to assess the level of error, we chose 100 frames randomly from each of the 20 days. No errors were found. A position was considered correct if the bounding box contained most of a person. A behavior was considered correct if that behavior was occuring in that frame.

The bounding box should intersect with a major portion of the person's body. They were largely automatically found.

Additional useful data and paper to cite

T. Qasim, R. B. Fisher, N. Bhatti; Ground-truthing Large Human Behavior Monitoring Datasets, Proc. 2020 Int. Conf on Pattern Recognition, online, 2021.
This paper describes the creation of the ground truth for this dataset, and should be cited by any papers based on the dataset.
K. Ye in his MSc project (K. Ye, "Continuous Analysis of Office Video", MSc thesis (Informatics), Univ. of Edinburgh, 2016) implemented an HMM to recognize the state of the main person in each frame. Using the ground-truth target positions, he achieved an average accuracy of 90% (over 10 videos) on state labeling when requiring exact time frame matching and 100% when allowing a ± 7 frame (7 seconds) tolerance.
P. Stefanov in his undergraduate project (P. Stefanov, Continuous Analysis of Office Video, MInf thesis (Informatics), Univ. of Edinburgh, 2017) extended the ground truth, explored automatic detection and tracking, and recognition over a subset of the videos.

Acknowledgments

Thanks to Paul Anderson, Robert Fisher, Jane Hillston, and Kami Vaniea for agreeing to have a video camera in their office. Also thanks to various students who agreed to be videoed during the recording days. The ground truth was prepared by Kuangzheng Ye, Peter Stefanov, Nanbo Li and Tehreem Qasim, who also developed the correctness checking code (and made many corrections). Ye did initial investigations of using a fully connected HMM to recognise the current states of the main participant in the videos.

Use of the low resolution videos, images, and ground truth: Any use of the video data is under the Attribution-NonCommercial-ShareAlike (aka CC BY-NC-SA) license. Public use of the videos should include this acknowledgment: "We thank the University of Edinburgh for the use of the low resolution video and ground truth data." This paper should be cited: T. Qasim, R. B. Fisher, N. Bhatti; Ground-truthing Large Human Behavior Monitoring Datasets, Proc. 2020 Int. Conf on Pattern Recognition, online, 2021.

Contact

Email: Robert Fisher at rbf -a-t- inf.ed.ac.uk.

School of Informatics, Univ. of Edinburgh
1.26 Informatics Forum, 10 Crichton St
Edinburgh EH8 9AB, UK
Tel: +44-(131)-651-3441 (direct line), +44-(131)-651-3443 (secretary)
Fax: +44-(131)-650-6899