CWITR: A Corpus for Automatic Complex Word Identification in Turkish Texts

Link:
Autor/in:
Verlag/Körperschaft:
Association for Computing Machinery (ACM)
Erscheinungsjahr:
2023
Medientyp:
Text
Schlagworte:
  • Text simplification
  • Lexical complexity
  • Crowdsourcing
  • Complex word identification
Beschreibung:
  • The Complex Word Identification (CWI) task aims to provide support to resolve accessibility barriers for people who experience difficulties with cognitive, language, and learning disabilities. The task is concerned with the detection and identification of complex words that are unusual and difficult to understand by certain target groups. CWI systems have a large impact on the output of Text Simplification (TS) systems. This paper revisits the CWI task by extending available datasets by creating a new CWI corpus. In this study, we collect a new CWI dataset (CWITR) of complex single and multi-token words consisting of different text genres for Turkish and prepare it for investigation of computational methods on discrimination between complex and non-complex words forms.
Lizenz:
  • info:eu-repo/semantics/restrictedAccess
Quellsystem:
Forschungsinformationssystem der UHH

Interne Metadaten
Quelldatensatz
oai:www.edit.fis.uni-hamburg.de:publications/2867b51f-1dfb-4f3e-9117-34a5f43e1853