On Domain-specific Topic Modelling Using the Case of a Humanities Journal

Link:
Autor/in:
Beteiligte Personen:
  • Melzer, Sylvia
  • Peukert, Hagen
  • Thiemann, Stefan
Verlag/Körperschaft:
CEUR-WS.org
Erscheinungsjahr:
2023
Medientyp:
Text
Schlagworte:
  • Domain-specific corpora
  • LDA
  • Topic modelling
Beschreibung:
  • Topic modelling techniques have been an important tool for meaningful information retrieval. They also hold the potential to support researchers in areas such as humanities in exploring corpora of different topics in an automated way. One prominent method, latent Dirichlet allocation (LDA), describes documents as distributions over topics and topics as distributions over words. Most applications of LDA focus on sets of tweets, news articles, wikipedia entries, or academic publications covering various topics in a large corpus. In this article, LDA is used in a rather opposite setting: a domain-specific, small-scale corpus in the form of an academic journal concerned with the studies of modern and ancient manuscripts. From this case study, we infer steps specific to dealing with domain-specific corpora.

Lizenz:
  • info:eu-repo/semantics/restrictedAccess
Quellsystem:
Forschungsinformationssystem der UHH

Interne Metadaten
Quelldatensatz
oai:www.edit.fis.uni-hamburg.de:publications/d45762bd-cd53-446e-b5f9-c3ba23a803a3