Article ID: | iaor20073438 |
Country: | United States |
Volume: | 50 |
Issue: | 3 |
Start Page Number: | 597 |
End Page Number: | 604 |
Publication Date: | May 2004 |
Journal: | Management Science |
Authors: | Hora Stephen C. |
Expert judgment elicitation is often required in probabilistic decision making and the evaluation of risk. One measure of the quality of probability distributions given by experts is calibration – the faithfulness of the probabilities in an empirically verifiable sense. A method of measuring calibration for continuous probability distributions is presented here. A discussion of the impact of using linear rules for combining such judgments is given and an empirical demonstration is given using data collected from experts participating in a large-scale risk study. It is shown by theoretical argument that combining well-calibrated distributions of individual experts using linear rules can only result in reducing calibration. In contrast, it is demonstrated, both by example and empirically, that an equally weighted linear combination of experts who tend to be ‘overconfident’ can produce distributions that are better calibrated than the experts' individual distributions. Using data from training exercises, it is shown that the improvement in calibration is rapid as the number of experts is increased from one to five or six, but there is only modest improvement from increasing the number of experts beyond that point.