Article ID: | iaor201529118 |
Volume: | 66 |
Issue: | 11 |
Start Page Number: | 1895 |
End Page Number: | 1905 |
Publication Date: | Nov 2015 |
Journal: | Journal of the Operational Research Society |
Authors: | Bravo Cristin, Maldonado Sebastin |
Keywords: | datamining, sets, simulation |
Dataset shift is present in almost all real‐world applications, since most of them are constantly dealing with changing environments. Detecting fractures in datasets on time allows recalibrating the models before a significant decrease in the model’s performance is observed. Since small changes are normal in most applications and do not justify the efforts that a model recalibration requires, we are only interested in identifying those changes that are critical for the correct functioning of the model. In this work we propose a model‐dependent backtesting strategy designed to identify significant changes in the covariates, relating a confidence zone of the change to a maximal deviance measure obtained from the coefficients of the model. Using logistic regression as a predictive approach, we performed experiments on simulated data, and on a real‐world credit scoring dataset. The results show that the proposed method has better performance than traditional approaches, consistently identifying major changes in variables while taking into account important characteristics of the problem, such as sample sizes and variances, and uncertainty in the coefficients.