Article ID: | iaor201526077 |
Volume: | 18 |
Issue: | 2 |
Start Page Number: | 156 |
End Page Number: | 165 |
Publication Date: | Jun 2015 |
Journal: | Health Care Management Science |
Authors: | Jones Simon, Bottle Alex, Aylin Paul, Gaudoin Ren, Montana Giovanni |
Keywords: | risk |
The aims of supervised machine learning (ML) applications fall into three broad categories: classification, ranking, and calibration/probability estimation. Many ML methods and evaluation techniques relate to the first two. Nevertheless, there are many applications where having an accurate probability estimate is of great importance. Deriving accurate probabilities from the output of a ML method is therefore an active area of research, resulting in several methods to turn a ranking into class probability estimates. In this manuscript we present a method, splined empirical probabilities, based on the receiver operating characteristic (ROC) to complement existing algorithms such as isotonic regression. Unlike most other methods it works with a cumulative quantity, the ROC curve, and as such can be tagged onto an ROC analysis with minor effort. On a diverse set of measures of the quality of probability estimates (Hosmer‐Lemeshow, Kullback‐Leibler divergence, differences in the cumulative distribution function) using simulated and real health care data, our approach compares favourably with the standard calibration method, the pool adjacent violators algorithm used to perform isotonic regression.