Article ID: | iaor20073040 |
Country: | United Kingdom |
Volume: | 17 |
Issue: | 1 |
Start Page Number: | 4 |
End Page Number: | 9 |
Publication Date: | Mar 2004 |
Journal: | OR Insight |
Authors: | Winkler Atai |
Keywords: | practice |
Many companies and organisations spend vast amounts of money and time collecting and analysing the lifestyle, demographic and transaction data they hold about customers. The amount of data held is increasing rapidly, but one very important aspect of the data that has received and continues to receive very little attention is the problem of missing data. How should missing data be treated? Avoid the problem altogether by ignoring records with missing data? Or analyse the available data to impute ‘good’ values for the missing data? This article describes the problem of missing values and then discusses some of the methods used to impute missing values, particularly those that are suitable for the huge datasets held by many companies and organisations. The article concludes by presenting imputation results using k-nearest neighbours, a powerful imputation method ideally suited to large datasets.