Augmenting and automating corpus enrichment

Link:

https://doi.org/10.1109/ICSC.2020.00017

Autor/in:

Verlag/Körperschaft:

IEEE

Erscheinungsjahr:

2020

Medientyp:

Text

Schlagworte:

"Embedding; Named Entity Recognition; Entailment"
"Semantics; Models; Recommender Systems"
"Embedding; Named Entity Recognition; Entailment"
"Semantics; Models; Recommender Systems"
Text mining
Automating corpus enrichment
Subjective content descriptions
Document type detection

Beschreibung:

An agent in pursuit of a task may work with a reference library containing documents with linked subjective content descriptions. Faced with a new document, an agent has to decide whether to include the new document in its reference library. Basing the decision on only words, topics, or entities has shown not to lead to a balanced performance for varying documents. Even a combination of words and descriptions does not lead to a single indicator, requiring manual post-processing. Therefore, in this paper, we build a single indicator by detecting the type of a new document using sequential information about descriptions, thus automating the decision. Specifically, an ensemble of hidden Markov models for the document types detects the type of a document. The agent then bases its decision on the detected type. Using hidden Markov models also allows for identifying positions of interest within a new document. A case study shows the effectiveness of our approach.

Lizenz:

info:eu-repo/semantics/closedAccess

Quellsystem:

Forschungsinformationssystem der UHH

Interne Metadaten

Quelldatensatz: oai:www.edit.fis.uni-hamburg.de:publications/f35b920c-6abe-4385-bf86-c6facbc67c99