Article ID: | iaor20012648 |
Country: | United States |
Volume: | 26 |
Issue: | 6 |
Start Page Number: | 803 |
End Page Number: | 818 |
Publication Date: | Nov 1995 |
Journal: | Decision Sciences |
Authors: | VonPuelz A., Sobol M.G. |
Keywords: | validation |
The widespread use of regression analysis as a business forecasting tool and renewed interest in the use of cross-validation to aid in regression model selection make it essential that decision makers fully understand methods of cross-validation in forecasting, along with the advantages and limitations of such analysis. Only by fully understanding the process can managers accurately interpret the important implications of statistical cross-validation results in their determination of the robustness of regression forecasting models. Through a multiple regression analysis of a large insurance company's customer database, the Herzberg equation for determining the criterion of validity and analysis of samples of different size from the two regions covered by the database, we illustrate the use of statistical cross-validation and test a set of factors hypothesized to be related to the statistical accuracy of validation. We find that increasing sample size will increase reliability. When the magnitude of population model differences is small, validation results are found to be unreliable, and increasing sample size has little or no effect on reliability. In addition, the relative fit of the model for the derivative sample and the validation sample has an impact on validation accuracy, and should be used as an indicator of when further analysis should be undertaken. Furthermore, we find that the probability distribution of the population independent variables has no effect on validation accuracy.