Edinburgh Pig Behavior Video Dataset

The pig behavior dataset consisting of 23 days (over 6 weeks) of daytime pig video captured from a nearly overhead camera. Registered color and depth videos are captured. The data was captured at 6 frames per second, and stored in batches of 1800 frames (5 minutes). Most frames show 8 pigs.

An example of the color and depth images can be seen here:

Color pig image Depth pig image

The feeder is visible at the bottom center, and the two water sources are at the bottom left (visible) and bottom right (not visible behind the pig in this image) of the pen. Altogether, there are approximately 3,429,000 data frames captured.

More detailed background

The dataset was collected between 5 Nov and 11 Dec (2019, 6 weeks) in a single pigpen (5.8m x 1.9m) with 8 growing pigs at the SRUC (Scotland's Rural College) research pig unit (near Edinburgh, UK). The pigs were mixed intact males and females weighing around 30kg at the start of the study. They were given a 3-space feeder with ad libitum commercial pig feed, two nipple water drinkers and a plastic enrichment device (Porcichew, East Riding Farm Services Ltd, Yorkshire, UK) suspended at pig height from a chain (see yellow object at top center). Pigs were also given straw and shredded paper on a part slatted floor. Color image and depth data was collected using an Intel RealSense D435i camera positioned at 2.5 meters from the ground. Both RGB and depth information were acquired at 6fps with a resolution of 1280×720, and the acquisition was limited to daytime (from 7AM to 7PM), due to the absence of artificial light during nighttime.

Ground truthed sequences

Twelve of the sequences have been manually ground-truthed, with axis-aligned bounding boxes, a persistent tracking identifier, and a behavior label. The labels are applied to every third image, so there are 600 labeled frames in each sequence, for a total of 7200 labeled frames, each with 8 labeled pigs.

An example of the ground-truth bounding boxes can be seen here, along with the mask.jpg mask file that indicates the usable pen area:

Color pig image with labels mask image

The ground truth files are in annotated.tar (3.2 Gb), which consists of one folder for each sequence. All sequences are associated with a folder called a 'CLIP'. The clips that are ground-truthed are:

5/11/2019000002, 000009
10/12/2019000060, 000078

Each CLIP folder contains:

Source video files

There are 25 files, for 23 collection days. 23 of the files are zip files, the other 2 are tar files. They can be downloaded from:


When uncompressed (tar or zip) there is one folder for each video clip. Each day file contain from 10 to 116 video clips in separate subfolders, numbered 000000, 000001, and so on. There may be a few missing subfolders in a sequence. Each subfolder contains 1 video clip of 5 minutes length, with 1800 video frames, along with the other files described below:

  1. background.png - a single color frame originally planned for change detection, but not used
  2. background_depth.png - a single depth frame originally planned for change detection, but not used
  3. color.mp4 - 1800 frames of registered color
  4. depth.mp4 - 1800 frames of registered depth
  5. depth_scale.npy - scale from pixel value to cm for the depth sensor
  6. inverse_intrinsic.npy - inverse of the camera intrinsic parameters used to map depth point into 3D points
  7. mask.png - a file originally planned to be a detection zone mask, but not used
  8. rot.npy - the rotation and translation parameters for the camera. Not used.
  9. times.txt - the time that each frame was captured

Automatically detected bounding boxes and associated behavior classifications

Bounding boxes and behaviors have been automatically computed for all pigs in all frames. This data can be downloaded from the file results_dataset.tar.gz (4.1 Gb). These results were computed based on the algorithms described in the paper: L. Bergamini, S. Pini, A. Simoni, R. Vezzani, S. Calderara, R. B. D'Eath, R. B. Fisher; Extracting Accurate Long-Term Behavior Changes from a Large Pig Dataset, Proc. Int. Conf. on Computer Vision Theory and Applications (VISAPP 2021), online 8-10 February 2021. where the results computed from this dataset were described. No guarantee is given about the correctness of the results.

When unpacked, this file consists of a number of date and clip folders and subfolders. The results for the days listed above can be found below. The clip numbers here match the clip numbers in the raw data.


Each clip folder contains 4 python files:

The description of the contents of each file is described below in section Format of automatically detected pigs files.


The main work was done by the authors of this paper:

L. Bergamini, S. Pini, A. Simoni, R. Vezzani, S. Calderara, R. B. D'Eath, R. B. Fisher; Extracting Accurate Long-Term Behavior Changes from a Large Pig Dataset, Proc. Int. Conf. on Computer Vision Theory and Applications (VISAPP 2021), online 8-10 February 2021.
which should be cited if the pig data is used in published research. Our thanks go to SRUC technician Mhairi Jack, and farm staff Peter Finnie and Phil O’Neill. SRUC’s contribution to this work was funded by the Rural and Environment Science and Analytical Services Division of the Scottish Government. Ethical approval was obtained for the pig experiments.


Email: Prof. Robert Fisher at rbf -a-t- inf.ed.ac.uk.

School of Informatics, Univ. of Edinburgh
1.11 Bayes Centre, 47 Potterrow, Edinburgh EH8 9BT, UK
Tel: +44-(131)-651-3441 (direct line), +44-(131)-651-3443 (secretary)

Valid HTML 4.0! CC-BY-NC

Ground-truth and detection JSON file format

The ground-truth and detection JSON file format consists of a header, and a descriptor for each detected and tracked pig in each frames. NOTE 1: the ground truth only labels up to 600 frames (every third frame) so the frame counter only goes to 600 and ground truth frame f corresponds to raw video frame 3*f. NOTE 2: a new descriptor is added only when the description changes, so some ground-truth frames might not have a descriptor in every frame for some pig, e.g. if it is sleeping without moving. Bounding boxes are interpolated. Behaviors are forward propagated (e.g. frames 2,3,4 will have the same behavior as 1).

Header format

The header and wrapper structure consists of:



There is one FRAMELIST for each tracked pig. In the case of the ground truth, this means 8 pigs. In the case of real data, there may be more or less than 8 depending on the number of pigs in the pen (usually 8 but may be fewer) and the success in tracking (which might break and start a new track). Below, "PIGID" is the persistent identifier of the pig in the FRAMEDATA, in the range 0, 1, ...



There is one entry for each detected pig:

  1. FNUM: first frame number where the data changes since the previous FRAMEDATA entry in the FRAMELIST. This is because the pig may not move, so the annotator may not change anything.
  2. XXX: image column of the upper left of the bounding box
  3. YYY: image row of the upper left of the bounding box, where 0 is at the top of the image
  4. WWW: width in pixels of the bounding box
  5. HHH: height in pixels of the bounding box
  6. "isGroundTruth":true - this annotation is part of the ground truth
  7. "visible":true - the bounding box was drawn when using the labeling tool
  8. BBB: one of the allowable behaviors from { "chase", "drink", "eat", "fight", "investigating", "jump-on-top-of", "lying", "mount", "nose-to-nose", "nose-poke-elsewhere", "play-with-toy", "run", "sitting", "sleep", "standing", "tail-biting", "walk" }.

Format of automatically detected pigs files

This section describes the format of the automatically detected pigs files associated with each video clip. Each clip folder contains 4 python files:

As the same detection and tracking information is in both tracklets.npy and behaviour_F.npy, you only need to use one of the three behaviour_F.npy files and can ignore tracklets.npy.


You can load the file and inspect its content with the following code. You need Python 3.

import numpy as np    
track = np.load('FILE-PATH/tracklets.npy', allow_pickle=True)  # 4 element array
track[0] --> frame rate (FPS) while generating tracklets of a single video
track[1] --> mean tracklet length
track[2] --> video output resolution (should be always 1280x720 pixels)
track[3] --> N-dimensional (N = number of tracklets) list. Each list entry contains an M-dimensional (M = tracklet length) array representing a single tracklet.
             Each entry in the tracklet array has 6 values [frameID, bbox_X_min, bbox_Y_min, bbox_X_max, bbox_Y_max, tracklet_state],
             "tracklet_state" can have 3 values: 0 = new tracklet, 1 = updated tracklet, 2 = wrong tracklet or no available detection


You can load the file and inspect its content with the following code. You need Python 3. F = 15,20,25 is the number of frames taken into account in the temporal analysis. The minimum movement threshold is fixed to 25mm per frame for the moving behaviour and 110mm for the running behaviour.

import numpy as np
track = np.load('FILE-PATH/behaviour_F.npy', allow_pickle=True)  # N-dimensional array (N = number of tracklets)
track[i] --> array with Mx14 dimension (M = tracklet length) where the 14 values are in order (for track i and frame m of the track):
     track[i][m][0] --> frameID
     track[i][m][1] --> bbox_X_min
     track[i][m][2] --> bbox_Y_min
     track[i][m][3] --> bbox_X_max
     track[i][m][4] --> bbox_Y_max
     track[i][m][5] --> tracklet_state: 0 = new tracklet, 1 = updated tracklet, 2 = wrong tracklet or no available detection
     track[i][m][6] --> temporal_analysis, i.e. mean absolute difference in terms of bounding box center pixel position with regard to the previous F (15, 20 or 25) frames
     track[i][m][7] --> theta, i.e. angle of orientation of the pig
     track[i][m][8] --> center_of_mass_X
     track[i][m][9] --> center_of_mass_Y
     track[i][m][10] --> length of the pig computed with moments
     track[i][m][11] --> width of the pig computed with moments
     track[i][m][12] --> zone in the pen where the center of mass of the pig is seen (0 = other, 1 = feeder, 2 = water bottom left, 3 = water bottom right, 4 = playing with yellow toy)
     track[i][m][13] --> behavior code (0 = unknown, 1 = not moving, 2 = moving, 3 = running, 4 = eating, 5 = drinking, 6 = playing, 101 = standing, 102 = lying)