Frank Keller, University of Edinburgh

Code

Neural Attention Tradeoff (NEAT) Model: This implementation contains the code for the neural attention model of human reading presented in in Hahn and Keller(2016). The model is able to capture skipping behavior and reading times as recorded in eye-tracking data, and has been evaluated against the Dundee corpus.

Psycholinguistically Motivated Tree-Adjoinging Grammar (PLTAG) Parser: This implementation contains a fully incremental PLTAG parser, with incremental semantic role labeling capability and discriminative reranking. The parser is described in Demberg et al. (2013) and in subsequent papers.

Datasets

MultiSense Dataset: 9,504 images paired with translation-ambiguous verbs. Each image is annotated with an English verb and its translations in German and Spanish. The data can be used to train multilingual, multimodal sense disambiguation models. A 995 image subset of MultiSense is also annotated with English description and their German translations. This can be used to evaluate the sense disambiguation capabilities of multimodal translation models. The dataset is described in Gella et al. (2019).

Verb Senses in Images (VerSe) Dataset: 3,518 images, each annotated with one of 90 verbs and with the OntoNotes sense realized for the verb in the image. The images are taken from two existing multimodal datasets (COCO and TUHOI). The dataset is described in Gella et al. (2019).

Pascal 2007 Center-click Annotation Dataset: This dataset provides center-click annotations collected on Amazon Mechanical Turk for all 20 classes of the whole trainval set of Pascal VOC 2007. Each image is annotated by two different annotators for each class in the image. This results in 14,612 clicks in total for the 5,011 trainval images. We also provide the localizations produced by our center-click object localization approach. The approach and the dataset are described in Papadopoulos et al. (2017).

Pascal Objects Eye Tracking (POET) Dataset: 6,270 images from ten Pascal VOC 2012 objects classes (cat, dog, bicycle, motorbike, boat, aeroplane, horse, cow, sofa, diningtable). Each image is annotated with the eye movement record of five participants, whose task was to identify which object class was present in the image. The dataset is described in Papadopoulos et al. (2014).

Comparing Image Description Measures: This is the dataset and code used to estimate the correlation of different text-based evaluation measures for automatic image description on the Flickr8K dataset. The measures compared include BLEU4, TER, Meteor, and ROUGE-SU4. The work is described in Elliott and Keller (2014).

Visual and Linguistic Treebank: 2,424 images with human-generated image descriptions; 341 of these images are also annotated with object boundaries and Visual Dependency Representations. The dataset is described in Elliott and Keller (2013).

Object Naming Dataset: 100 images with eye-tracking data from 24 participants performing an object naming task. The data includes manually annotated object boundaries and object labels produced by participants. The dataset is described in Clarke et al. (2013).

Bigram Plausiblity Dataset: Plausibility judgments for seen and unseen adjective-noun, noun-noun, and verb-object bigrams (90 items each). Magnitude estimation judgments of plausibility were obtained in a web-based experiment from 27 to 40 participants per item. The dataset is described in Keller and Lapata (2003).