Evaluating indeterministic duplicate detection results

Link:

https://doi.org/10.1007/978-3-642-33362-0_33

Autor/in:

Beteiligte Person:

Hüllermeier, Eyke

Verlag/Körperschaft:

Springer

Erscheinungsjahr:

2012

Medientyp:

Text

Schlagworte:

Database systems
Algorithms
Privacy-preserving record
Database Systems
Ontology
Query Processing
quality evaluation
entity resolution
probabilistic clustering
indeterministic duplicate detection
probabilistic duplicate detection
Database systems
Algorithms
Privacy-preserving record
Database Systems
Ontology
Query Processing

Beschreibung:

Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e. approaches in which ambiguous duplicate decisions are probabilistically modeled in the resultant data, have been developed. To rate the goodness of a duplicate detection approach, its detection results need to be evaluated in their quality. In this paper, we propose several semantics to apply traditional quality evaluation measures to indeterministic duplicate detection results and exemplarily present an efficient evaluation for one of these semantics. Finally, we present some experimental results.

Lizenz:

info:eu-repo/semantics/restrictedAccess

Quellsystem:

Forschungsinformationssystem der UHH

Interne Metadaten

Quelldatensatz: oai:www.edit.fis.uni-hamburg.de:publications/d8698c63-578d-4099-8502-b051d5616cea