The DGS Corpus Project: Development of a Corpus Based Electronic Dictionary German Sign Language – German

Link:
Autor/in:
Verlag/Körperschaft:
Universität Hamburg
Erscheinungsjahr:
2010
Medientyp:
Text
Beschreibung:
  • With a target size of 400 hours video from 300 informants, the DGS (German Sign Language) Corpus Project (2009-2023) is the first project that will create a DGS corpus comparable in size to spoken language corpora. In addition to making a large language resource available for research, the project will develop a comprehensive DGS dictionary based on corpus data, significantly advancing state of the art in corpus-based sign language lexicography.

    Data collection is done with a mobile studio that over the course of three years will be moved to twelve different locations all over Germany. To obtain language data coming as close to natural language use of DGS as possible, two informants coming from the same region interact with each other in a large variety of tasks lasting six hours altogether. The mix of tasks (including warm-ups etc.) is the result of a pilot phase with extensive testing of various elicitation formats and materials.

    The studio setup was chosen to make the informants feel as comfortable as possible without compromising recording quality. The 7-cameras setup (HD and stereoscopic cameras) promises to deliver videos suitable for manual transcription and image processing that over the course of the project is expected to deliver semi-automatic annotation to increase the effectiveness of the transcription process.

    As a first step to make the corpus data accessible, translations into German will be produced. The next step is a basic transcription, providing a sign-by-sign segmentation and type-token matching. More detailed transcription (including grammatical information, use of space, eye gaze) will be carried out in a third phase. As limited resources do not allow the whole corpus to be transcribed in detail, it will mainly be the lexicographical workflow determining which parts of the corpus need to be transcribed in detail.

    Both the transcription and lexicographic work will be carried out within the iLex environment which will steadily be extended over the course of the project in order to make use of synergies with other projects running in parallel (such as Dicta-Sign on semi-automated annotation) or to match new challenges from new linguistic research questions. With more than 20 people working concurrently with corpus data, it is evident that quality assurance has a central role in the project. Intensive transcriber trainings and coding manuals as well as experiments on formalizing inter- and intra-transcriber agreements for coding conventions used (such as HamNoSys) are only the first steps taken. In addition, researchers as well as student coworkers are invited to carry out pilot data experiments on annotation data and metadata to see if data analyses are possible within the existing data model and annotation conventions long before enough data become available to make these studies really feasible. Feedback from these experiments allow us to continually evolve the transcription process and adapt the transcription environment.

    It is essential for the success of the project that the language community is involved in the project beyond those people participating as informants. The task of contact persons in each region is therefore not limited to finding informants for the data recordings, but also to raise public awareness within the Deaf community. A web portal focusing on the community’s interests in the project (including viewing the corpus video material as a resource for cultural heritage) will also encourage people to provide feedback. In the dictionary context, this might include feedback on individual signs’ regional distribution. Ideas to make this portal not only attractive to contact persons and informants, but to the community at large, include offering consultation hours for questions about the language.

Beziehungen:
DOI 10.25592/uhhfdm.8262
Lizenzen:
  • https://creativecommons.org/licenses/by/4.0/legalcode
  • info:eu-repo/semantics/openAccess
Quellsystem:
Forschungsdatenrepositorium der UHH

Interne Metadaten
Quelldatensatz
oai:fdr.uni-hamburg.de:8263