INEL Enets Corpus

Link:

https://doi.org/10.25592/uhhfdm.16182

Autor/in:

Beteiligte Personen:

Arkhipov, Alexandre
Wagner-Nagy, Beáta
Lazarenko, Elena
Riaposov, Aleksandr
Lehmberg, Timm

Verlag/Körperschaft:

Universität Hamburg

Erscheinungsjahr:

2024

Medientyp:

Datensatz

Schlagworte:

Uralic
Samoyedic
Enets
Forest Enets
Tundra Enets
endangered language
language contact
language documentation
legacy data
INEL
AdWHH
text corpus
speech corpus
parallel texts
folklore
tales
narrative
dialogue
song
transcription
time-aligned
audio
video
morphological glossing
part-of-speech
borrowings
code-switching
English translation
Russian translation
EXMARaLDA
ELAN
XML
ISO/TEI

Beschreibung:

Corpus Citation

Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta. 2024. INEL Enets Corpus. Version 1.0. Publication date 2024-11-30. https://hdl.handle.net/11022/0000-0007-FE1D-C. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Enets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033.

The corpus includes texts recorded between 1962–2017 in both Enets lects – Forest Enets and Tundra Enets. The sources of the corpus (see more details in the user documentation, section 2.2) are:
- Audio recordings done by Olesya Khanina, Maria Ovsjannikova, Andrey Shluinsky, Natalia Stoynova and Sergey Trubetskoy,
- Legacy audio recordings done by Vera Bettu, Nina N. Bolina, Dar`ya S. Bolina, Zoya N. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Kaur Mägi, Viktor N. Pal`chin, Marina N. Pal`china, Irina P. Sorokina†, Anna Urmanchieva, Beáta Wagner-Nagy and possibly other people,
- Published audio recordings,
- Texts published by Dar`ya S. Bolina, Yaroslav A. Gluxij† and Vasilij A. Susekov†, Eugene Helimski†, Kazimir I. Labanauskas†, Tibor Mikola†, János Pusztay, Irina P. Sorokina†, Anna Urmanchieva,
- Legacy manuscript transcriptions and self-transcriptions done and/or edited by Dar`ya S. Bolina, Galina S. Bolina, Zoya N. Bolina, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Vasilij F. Ly`rmin†, Anton N. Pal`chin, Viktor N. Pal`chin, Ivan I. Silkin†, Irina P. Sorokina†, Natal`ya M. Tereščenko†, Anna Urmanchieva and possibly other people.
All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Video recordings are also included into the corpus if available.

Corpus size
- Forest Enets: 541 texts, 41,396 sentences, 173,379 tokens
- Tundra Enets: 137 texts, 12,737 sentences, 45,331 tokens
- Total: 678 texts, 54,133 sentences, 218,710 tokens
- Total duration of audio: 43 hours 26 minutes
Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Preliminary glossing work included into this corpus was supported by Endangered Languages Documentation Programme (ELDP) and by Max Planck Institute for Evolutionary Anthropology (MPI-EVA). See more details on financial support in the documentation file below, section 1.6.

Contributions/Acknowledgements

Dozens of people and many institutions contributed to the corpus (see more details in the documentation file below, section 1.6). We are especially grateful to:
- Enets speakers who generously shared their knowledge, especially those who spent many days working with us: Aleksandr S. Bolin†, Leonid D. Bolin†, Viktor N. Bolin, Nadezhda K. Bolina, Nina N. Bolina, Ekaterina S. Glibchenko, Gennadij A. Ivanov†, Irina P. Koshkaryova†, Valentina P. Nader, Lyudmila P. Novosyolova, Svetlana A. Roslyakova†, Ivan I. Silkin†, Nikolaj I. Silkin, Alevtina S. Silkina, Zoya A. Turutina, Tat`yana Ch. Yar,
- In particular, Zoya N. Bolina and Viktor N. Pal`chin who also collaborated in ELDP project and extensively transcribed Enets recordings,
- Natalia Stoynova, Sergey Trubetskoy and foremostly Maria Ovsjannikova who did recordings and transcriptions of Enets texts,
- Institutions and private individuals who shared legacy data: the Institute for Linguistic Studies RAS, the Taymyr House of National Arts, the Dudinka branch of GTRK “Norilsk”; Dar`ya S. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Larisa Leisiö, Viktor N. Pal`chin, Irina P. Sorokina†, Anna Urmanchieva,
- Marina Lyublinskaya and Anna Urmanchieva who kindly permitted to include texts processed by them into the corpus,
- Dar`ya S. Bolina who consulted a lot in the process of compilation of the corpus.
Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/EnetsCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags.
Find further information and links on the Enets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/enets/.

relatedIdentifier:

Zitiert von: Handle 11022/0000-0007-FE1D-C DOI 10.25592/uhhfdm.16181

Lizenzen:

https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
info:eu-repo/semantics/openAccess

Quellsystem:

Forschungsdatenrepositorium der UHH

Interne Metadaten

Quelldatensatz: oai:fdr.uni-hamburg.de:16182