N³ - A collection of datasets for named entity recognition and disambiguation in the NLP interchange format

Link:

https://www.fis.uni-hamburg.de/publikationen/detail.html?id=8eed349f-78a5-415d-95d8-1d6af6637165

Autor/in:

Beteiligte Personen:

Calzolari, Nicoletta
Choukri, Khalid
Goggi, Sara
Declerck, Thierry
Mariani, Joseph
Maegaard, Bente
Moreno, Asuncion
Odijk, Jan
Mazo, Helene
Piperidis, Stelios
Loftsson, Hrafn

Verlag/Körperschaft:

European Language Resources Association (ELRA)

Erscheinungsjahr:

2014

Medientyp:

Text

Schlagworte:

Datasets
Named entity detection
Named entity disambiguation
NLP interchange format

Beschreibung:

Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N³). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.

Lizenzen:

info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/

Quellsystem:

Forschungsinformationssystem der UHH

Interne Metadaten

Quelldatensatz: oai:www.edit.fis.uni-hamburg.de:publications/8eed349f-78a5-415d-95d8-1d6af6637165