Evaluating indeterministic duplicate detection results

Link:
Autor/in:
Beteiligte Person:
  • Hüllermeier, Eyke
Verlag/Körperschaft:
Springer
Erscheinungsjahr:
2012
Medientyp:
Text
Schlagworte:
  • Database systems
  • Algorithms
  • Privacy-preserving record
  • Database Systems
  • Ontology
  • Query Processing
  • quality evaluation
  • entity resolution
  • probabilistic clustering
  • indeterministic duplicate detection
  • probabilistic duplicate detection
  • Database systems
  • Algorithms
  • Privacy-preserving record
  • Database Systems
  • Ontology
  • Query Processing
Beschreibung:
  • Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e. approaches in which ambiguous duplicate decisions are probabilistically modeled in the resultant data, have been developed. To rate the goodness of a duplicate detection approach, its detection results need to be evaluated in their quality. In this paper, we propose several semantics to apply traditional quality evaluation measures to indeterministic duplicate detection results and exemplarily present an efficient evaluation for one of these semantics. Finally, we present some experimental results.
Lizenz:
  • info:eu-repo/semantics/restrictedAccess
Quellsystem:
Forschungsinformationssystem der UHH

Interne Metadaten
Quelldatensatz
oai:www.edit.fis.uni-hamburg.de:publications/d8698c63-578d-4099-8502-b051d5616cea