The Spoken Wikipedia Corpora

Link:
Autor/in:
Beteiligte Personen:
  • Stegen, Florian
  • Baumann, Timo
  • Köhn, Arne
Verlag/Körperschaft:
Universität Hamburg
Erscheinungsjahr:
2017
Medientyp:
Datensatz
Schlagworte:
  • linguistics
  • English
  • German
  • Dutch
Beschreibung:
  • The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.

    Timo Baumann and Arne Köhn and Felix Hennig. 2018. The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening, in Language Resources and Evaluation, Special Issue representing significant contributions of LREC 2016.

    Arne Köhn, Florian Stegen, Timo Baumann. 2016. Mining the Spoken Wikipedia for Speech Data and Beyond, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).

     

    CLARIN Metadata summary for The Spoken Wikipedia Corpora (CMDI-based)

    Title: The Spoken Wikipedia Corpora
    Description: The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.
    Publication date: 2017
    Data owner: Timo Baumann - Universität Hamburg
    Contributors: Timo Baumann (author), Arne Köhn (author), Florian Stegen (author)
    Languages: English (eng), German (deu), Dutch (nld)
    Size: 5397 article, 1005 hour
    Segmentation units: other
    Genre: encyclopedia
    Modality: spoken
    References: Timo Baumann; Arne Köhn; Felix Hennig (2018) The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening References: Arne Köhn; Florian Stegen; Timo Baumann (2016) Mining the Spoken Wikipedia for Speech Data and Beyond

     

Beziehungen:
DOI 10.25592/uhhfdm.1874
Lizenzen:
  • https://creativecommons.org/licenses/by-sa/4.0/legalcode
  • info:eu-repo/semantics/openAccess
Quellsystem:
Forschungsdatenrepositorium der UHH

Interne Metadaten
Quelldatensatz
oai:fdr.uni-hamburg.de:1875