Article ID: | iaor20023200 |
Country: | United Kingdom |
Volume: | 53 |
Issue: | 4 |
Start Page Number: | 457 |
End Page Number: | 467 |
Publication Date: | Apr 2002 |
Journal: | Journal of the Operational Research Society |
Authors: | Johnson D.G. |
Keywords: | statistics: distributions |
This paper examines further the problem of approximating the distribution of a continuous random variable based on three key percentiles, typically the median (50th percentile) and the 5% points (5th and 95th percentiles). This usually involves the two main distribution parameters, the mean and standard deviation, and, if possible, the distribution function concerned. Previous research has shown that the Pearson–Tukey formulae provide highly accurate estimates of the mean and standard deviation of a beta distribution (of the first kind), and that simple modifications to the standard deviation formula will improve the accuracy even further. However, little work has been done to establish the accuracy of these formulae for other distributions, or to examine the accuracy of alternative formulae based on triangular distribution approximations. We show that the Pearson–Tukey mean approximation remains highly accurate for a range of unbounded distributions, although the accuracy in these cases can be improved by a slightly different 3:10:3 weighting of the 5%, 50% and 95% points. In contrast, the Pearson–Tukey standard deviation formula is much less accurate for unbounded distributions, and can be bettered by a triangular approximation whose parameters are estimated from simple linear combinations of the three percentile points. In addition, triangular approximations allow the underlying distribution function to be estimated by a triangular cdf. It is shown that simple formulae for estimating the triangular parameters, involving weights of 23:–6:–1, –13:42:–13 and –1:–6:23, give not only universally accurate mean and standard deviation estimates, but also provide a good fit to the distribution function with a Kolmogorov–Smirnov statistic which averages 0.1 across a wide range of distributions, and an even better fit for distributions which are not highly skewed.