Record matching in data warehouses: A decision model for data consolidation

Record matching in data warehouses: A decision model for data consolidation

0.00 Avg rating0 Votes
Article ID: iaor20033040
Country: United States
Volume: 51
Issue: 2
Start Page Number: 240
End Page Number: 254
Publication Date: Mar 2003
Journal: Operations Research
Authors:
Keywords: programming: integer, datamining, computers: information
Abstract:

The notion of a data warehouse for integrating operational data into a single repository is rapidly becoming popular in modern organizations. An important issue in the integration process is how to deal with the identifier mismatch problem when combining similar data from disparate sources. A real-world entity may be represented using different identifiers in different operational data sources, and matching them may often be difficult using simple database operations expressed, say, as an structured query language query. A record-by-record manual matching is also not practical because the databases may be large. A decision model is presented that combines probability-based automated matching with manual matching in a cost minimization formulation. A heuristic approach is proposed for solving the decision model. Both the model and the heuristic solution approach have been tested on real data. The results from the testing indicate that the model can be effectively used in real-world situations.

Reviews

Required fields are marked *. Your email address will not be published.