Article ID: | iaor2000340 |
Country: | United States |
Volume: | 10 |
Issue: | 4 |
Start Page Number: | 399 |
End Page Number: | 416 |
Publication Date: | Sep 1998 |
Journal: | INFORMS Journal On Computing |
Authors: | Ghosh Deb, Sarkar Sumit |
Keywords: | decision theory |
The ability of a computerized system to model the reasoning process of humans has become an important area of research. This research considers a probabilistic reasoning system for applications that require decision making under uncertain conditions. The reasoning system captures the uncertainty associated with different feasible outcomes, and based on historical data, provides users with a measure of this uncertainty. To make accurate predictions, the scheme requires that predictive variables that are not conditionally independent of each other given the outcome be grouped into compound attributes for the purpose of estimating probabilities. These compound attributes partition the entire set of predictive attributes into disjoint sets. An important design issue, then, is that the appropriate partitioning scheme be obtained before the reasoning scheme is used in practice. We formulate the problem of finding the optimal partitioning scheme, and present five different (although related) heuristic techniques to induce partitions from historical cases. Using simulated data, all five techniques are shown to capture accurately underlying dependencies across attributes when a reasonable amount of historical data is available for analysis. In situations where few historical cases are available, the induced structures are less accurate. In such situations, the performance of induced structures for making probability predictions is nevertheless found to be as good as that when using the true structure. Finally, we test for external validity by applying the techniques on a real-world dataset on credit applications for a bank. We show, using this dataset, that (i) the classificatory performance of the reasoning system using structures generated by the heuristic techniques is as good as the performance of the widely used decision tree induction algorithm C4.5, and (ii) the induced structures are able to provide reliable probability estimates for making decisions in environments with asymmetric misclassification costs.