Augmenting and automating corpus enrichment

Link:
Autor/in:
Verlag/Körperschaft:
IEEE
Erscheinungsjahr:
2020
Medientyp:
Text
Schlagworte:
  • "Embedding; Named Entity Recognition; Entailment"
  • "Semantics; Models; Recommender Systems"
  • "Embedding; Named Entity Recognition; Entailment"
  • "Semantics; Models; Recommender Systems"
  • Text mining
  • Automating corpus enrichment
  • Subjective content descriptions
  • Document type detection
Beschreibung:
  • An agent in pursuit of a task may work with a reference library containing documents with linked subjective content descriptions. Faced with a new document, an agent has to decide whether to include the new document in its reference library. Basing the decision on only words, topics, or entities has shown not to lead to a balanced performance for varying documents. Even a combination of words and descriptions does not lead to a single indicator, requiring manual post-processing. Therefore, in this paper, we build a single indicator by detecting the type of a new document using sequential information about descriptions, thus automating the decision. Specifically, an ensemble of hidden Markov models for the document types detects the type of a document. The agent then bases its decision on the detected type. Using hidden Markov models also allows for identifying positions of interest within a new document. A case study shows the effectiveness of our approach.
Lizenz:
  • info:eu-repo/semantics/closedAccess
Quellsystem:
Forschungsinformationssystem der UHH

Interne Metadaten
Quelldatensatz
oai:www.edit.fis.uni-hamburg.de:publications/f35b920c-6abe-4385-bf86-c6facbc67c99