A branch-and-cut algorithm for the continuous error localization problem in data cleaning

A branch-and-cut algorithm for the continuous error localization problem in data cleaning

0.00 Avg rating0 Votes
Article ID: iaor20083477
Country: United Kingdom
Volume: 34
Issue: 9
Start Page Number: 2790
End Page Number: 2804
Publication Date: Sep 2007
Journal: Computers and Operations Research
Authors: ,
Keywords: programming: integer, programming: branch and bound
Abstract:

Data collected by statistical agencies may contain mistakes made during the acquisition, transcription and coding process. Thus, before using all this information to infer statistical properties, the statistical agencies must check the consistency of their information. To this end, each record has to be tested on a set of consistency rules. Therefore, if one of these records does not meet all the consistency rules, it must be established which fields have to be modified in order to make the new record valid with respect to that set of consistency rules. Among all the possible solutions, statistical agencies are interested in finding one in which the number of fields that should be modified is minimum, thus leading to a combinatorial optimization problem known as the Error Localization Problem. This article approaches the optimization problem of finding the smallest set of fields whose values must be changed in order to satisfy a given set of consistency rules. With this purpose in mind an Integer Linear Programming model is proposed for the particular case in which the fields are continuous values and the consistency rules are given by linear inequalities. This model is solved through a branch-and-cut approach based on a Benders' decomposition. The new proposal is compared to others previously published in the literature and tested on benchmark instances. The overall performance of these new algorithms succeeded in solving to optimality difficult instances with up to 100 fields in a record in about 1 min on a personal computer.

Reviews

Required fields are marked *. Your email address will not be published.