CWITR: A Corpus for Automatic Complex Word Identification in Turkish Texts

Link:

https://doi.org/10.1145/3582768.3582802

Autor/in:

Verlag/Körperschaft:

Association for Computing Machinery (ACM)

Erscheinungsjahr:

2023

Medientyp:

Text

Schlagworte:

Text simplification
Lexical complexity
Crowdsourcing
Complex word identification

Beschreibung:

The Complex Word Identification (CWI) task aims to provide support to resolve accessibility barriers for people who experience difficulties with cognitive, language, and learning disabilities. The task is concerned with the detection and identification of complex words that are unusual and difficult to understand by certain target groups. CWI systems have a large impact on the output of Text Simplification (TS) systems. This paper revisits the CWI task by extending available datasets by creating a new CWI corpus. In this study, we collect a new CWI dataset (CWITR) of complex single and multi-token words consisting of different text genres for Turkish and prepare it for investigation of computational methods on discrimination between complex and non-complex words forms.

Lizenz:

info:eu-repo/semantics/closedAccess

Quellsystem:

Forschungsinformationssystem der UHH

Interne Metadaten

Quelldatensatz: oai:www.edit.fis.uni-hamburg.de:publications/2867b51f-1dfb-4f3e-9117-34a5f43e1853