Skip to content
Snippets Groups Projects
Duchateau Fabien's avatar
901d1af9
Name Last commit Last update
README.md

QEMAP - Additional material

This repository contains additional material related to Quality Estimation for Matching and Alignment Problems (QEMAP):

Abstract

Integrating information from multiple heterogeneous data sources is a key challenge in the Web and Big Data era. One of its essential task is the detection of corresponding elements between data sources, either at the model level (schema or ontology matching) or at the data level (entity matching or record linkage). Many solutions have been proposed to solve this task, and they can be compared and evaluated using dedicated benchmarks. However, most real-world datasets that need to be matched do not include expertized correspondences (ground truth), thus making it difficult to evalute the quality of the matching. In this paper, we propose a novel approach for estimating the quality achieved by a matching approach. It is based on the intuition that a correspondence has more chances to be ccorrect when detected by several matchers with different similarity metrics. Thus we propose to estimate matching quality by exploiting both the dissimilarity between matchers and their outputs. Besides, our approach is also useful to identify good matchers for a given dataset. An extensive set of experiments on 6 datasets enables to support our assumption.

Keywords: Evaluation/methodology, Database integration, Benchmarking, Matching quality, Data integration, Entity Matching, Ontology Alignment

Authors

Wei Yan, Fabien Duchateau and Franck Favetta