This web page contains video data and ground truth for 20 days of monitoring a person in their office. Twelve days are from one office, with 2 or 3 additional days each in 3 other offices. All people observed in the videos gave their consent to be recorded. This data was recorded Feb 18 - April 29, 2016 in offices in the School of Informatics at The University of Edinburgh.
This dataset is low frame rate video of people doing their normal activities in an office setting. The data is acquired using a fixed camera as a set of 1280*720 pixel color images captured at an average of about 1 FPS. Typical frames from two offices are shown here, and two typical frames with no people. Mainly the images show one person working, or an empty office. However, occasionally there are several other people in the office for a meeting.
![]() |
![]() |
Office 1 empty frame | Office 1 typical frame |
![]() |
![]() |
Office 2 empty frame | Office 2 typical frame |
This dataset is interesting because there are about 450K labeled frames of people doing standard office activities. This allows analysis of normal activities, and thus monitoring for interesting events, such as a person falling down, or being in a position without moving for a long period time (e.g. unconscious). Other interesting aspects of the dataset are:
We attempted to mark the position of each person in each image with a bounding box and a behavior. Given the number of images, not all marks are guarenteed to be correct. If errors are found, please let us know: (day,frame,person,bounding box upper left position, bounding box height, width, behavior). Thanks.
The groundtruth for each individual day video is saved in a mat file with the name of that day. The ground truth file dayN.mat contains a structure 'labels' (N=01...20). NumFrames = length(labels) gives the number of frames recorded for day N. The order of the bounding boxes is the same as the order of the image frames. 'labels{x}' is a 3x5 matrix for video frame x. There is one row for each person in the room (at maximum, there are three people present in a room). In each row in the 3x5 matrix, the first four values contain information about a bounding box around the person while the 5th value contains the behavior label. The first two values in a row are the column and row coordinates of the pixel at the top left of a bounding box around the person. The 3rd and 4th values in a row contain the width and height of the bounding box. If there is no person (or second/third person) present, then all values on that row are 0.
The behavior code labels are:
0 --- Room is empty (the position values are also 0) 1 --- Person is standing/walking 2 --- Person is sitting 3 --- Two or three people are talking to each other 4 --- Person in room has fallen
There are 20 days of video that can be downloaded from the table below, which contains the individual frames (TAR). There is also a AVI made from the frames (AVI). A ground truth file (GT) containing, for each frame, a record giving a bounding box for each main participant and their activity (one of 4 activity states listed above). A final file (FRAME NAMES) gives a list of the image frame file names that correspond with the ground truth.
Data summary: There are in total 456714 frames, 134110 with no one in the room, 249956 with 1 person, 63013 with 2 people, and 9635 with 3 people. There are 337 frames with a fallen person (days 6 and 11).
Here is example Matlab code that shows how to access the data and draw a bounding box around the person.
With up to 3 people in the scene and 456715 frames across the 20 days, there are undoubtedly some position and behavior labeling errors. We developed some automatic consistency checks and fixed identified errors. After this, to assess the level of error, we chose 100 frames randomly from each of the 20 days. No errors were found. A position was considered correct if the bounding box contained most of a person. A behavior was considered correct if that behavior was occuring in that frame.
The bounding box should intersect with a major portion of the person's body. They were largely automatically found.
Thanks to Paul Anderson, Robert Fisher, Jane Hillston, and Kami Vaniea for agreeing to have a video camera in their office. Also thanks to various students who agreed to be videoed during the recording days. The ground truth was prepared by Kuangzheng Ye, Peter Stefanov, Nanbo Li and Tehreem Qasim, who also developed the correctness checking code (and made many corrections). Ye did initial investigations of using a fully connected HMM to recognise the current states of the main participant in the videos.
Use of the low resolution videos, images, and ground truth:
Any use of the video data is under the
Attribution-NonCommercial-ShareAlike (aka CC BY-NC-SA)
license.
Public use of the videos should include this acknowledgment:
"We thank the University of Edinburgh for the
use of the low resolution video and ground truth data."
This paper should be cited:
T. Qasim, R. B. Fisher, N. Bhatti; Ground-truthing Large Human Behavior Monitoring Datasets, Proc. 2020 Int. Conf on Pattern Recognition, online, 2021.
Email: Robert Fisher at rbf -a-t- inf.ed.ac.uk.
School of Informatics, Univ. of Edinburgh