Article ID: | iaor20013123 |
Country: | Netherlands |
Volume: | 16 |
Issue: | 4 |
Start Page Number: | 437 |
End Page Number: | 450 |
Publication Date: | Oct 2000 |
Journal: | International Journal of Forecasting |
Authors: | Tashman Leonard J. |
In evaluations of forecasting accuracy, including forecasting competitions, researchers have paid attention to the selection of time series and to the appropriateness of forecast-error measures. However, they have not formally analyzed choices in the implementation of out-of-sample tests, making it difficult to replicate and compare forecasting accuracy studies. In this paper, I (1) explain the structure of out-of-sample tests, (2) provide guidelines for implementing these tests, and (3) evaluate the adequacy of out-of-sample tests in forecasting software. The issues examined include series-splitting rules, fixed versus rolling origins, updating versus recalibration of model coefficients, fixed versus rolling windows, single versus multiple test periods, diversification through multiple time series, and design characteristics of forecasting competitions. For individual time series, the efficiency and reliability of out-of-sample tests can be improved by employing rolling-origin evaluations, recalibrating coefficients, and using multiple test periods. The results of forecasting competitions would be more generalizable if based upon precisely described groups of time series, in which the series are homogeneous within group and heterogeneous between groups. Few forecasting software programs adequately implement out-of-sample evaluations, especially general statistical packages and spreadsheet add-ins.