Article ID: | iaor2008915 |
Country: | Netherlands |
Volume: | 42 |
Issue: | 3 |
Start Page Number: | 1494 |
End Page Number: | 1502 |
Publication Date: | Dec 2006 |
Journal: | Decision Support Systems |
Authors: | Parssian Amir |
Keywords: | datamining |
Aggregate data produced by decision support systems are utilized by managers in their decision making process to run or improve their firm's operations. Often, data residing in corporate databases and data warehouses are far from being perfect, and their imperfections have an impact on decision quality and outcome. Therefore, having knowledge about the effect of data errors on aggregate data could lead to more informed decisions, reduced risks, and competitive advantage. In this paper, we present a methodology to estimate the effects of data accuracy and completeness, as two important data quality dimensions, on the relational aggregate functions Count, Sum, Average, Max, and Min. Our methodology defines a set of attribute value types and deploys sampling strategies to determine the maximum likelihood estimates of each value type. We show the effect of data error rates on the scalar values returned by the aggregate functions and demonstrate the efficiency of our estimates by Monte Carlo simulations.