IE Relation Extraction Data Sets
The following table contains a list of
information extraction (IE) corpora with
information about what IE subtasks are
addressed. Relation annotation is highlighted
with a blue background. The focus of this list
is data sets that contain relation information.
The key for the column labels follows the table.
|
|
|
Table Key
The following abbreviations are used for
attributes describing IE subtasks:
- NER
- Data contains named entity annotation.
- CRF
- Data contains abbreviation/alias/coreference/normalisation annotation.
- RLD
- Data contains relation detection annotation.
- RLC
- Data contains relation characterisation annotation.
- TME
- Data contains date/time annotation.
- EVT
- Data contains event annotation.
|
Contribute To This List!
Please contact me if you you have
something to add. I am especially interested in
other domains and other languages.
|
Other IE Data
Some relevant IE data that I could not find
relation annotation for:
|
Other Lists of IE Data
Kevin Cohen's group at the Center for
Computational Pharmacology at UColorado maintain
a
list of corpora for biomedical NLP. Also
check out their survey
of corpus design and usage.
Other lists of corpora for IE/BioNLP can be found at:
|
| Benjamin
Hachey |