[Resources]

Data & Resources

Taxonomy Induction Using Hierarchical Random Graphs PDF
[Download]

tar.gz

This dataset contains 43 taxonomies over a fixed set of 12 basic-level concepts. Anonymous participants from Amazon Mechanical Turk were presented with a list of the 12 concepts and a mouse-driven, web-based taxonomy construction tool and asked to organise the concepts into a hierarchy.

The download consists of a number of XML files, one per participant, along with a short README describing the structure of the XML and explaining the elicitation study in more detail. It also provides a Ruby script for converting the XML-encoded taxonomies into GraphViz DOT format.

Meaning Representation in Natural Language Categories PDF
[Download]

tar.gz

This dataset extends the feature norms of McRae et al. with category information. For all of the words already present in the norms we've added category labels (e.g. "apple" is a FRUIT) and their corresponding typicality ratings (e.g. "grape" and "avocado" have typicality ratings 6.6 and 3.5 respectively among members of FRUIT. Categories and ratings were collected from Mechanical Turk.

The download is in the YAML data format, easily readable by a large number of programming languages (http://www.yaml.org/). Once loaded, the file mcrae_typicality.yaml should give a dictionary of dictionaries, where keys in the first dictionary are human-produced category labels and the second dictionary maps exemplars (words already present in the McRae norms) to average human-produced typicality ratings. McRae-style feature norms for the 41 category labels are also provided, in both YAML and CSV formats.