Article ID: | iaor20012377 |
Country: | United Kingdom |
Volume: | 11 |
Issue: | 1 |
Start Page Number: | 1 |
End Page Number: | 18 |
Publication Date: | Jan 2000 |
Journal: | IMA Journal of Mathematics Applied in Business and Industry |
Authors: | Chang K.C., Fung Robert, Lucas Alan, Oliver Robert, Shikaloff Nina |
Keywords: | credit scoring, Bayesian modelling |
The objectives of this paper are to apply the theory and numerical algorithms of Bayesian networks to risk scoring, and compare the results with traditional methods for computing scores and posterior predictions of performance variables. Model identification, inference, and prediction of random variables using Bayesian networks have been successfully applied in a number of areas, including medical diagnosis, equipment failure, information retrieval, rare-event prediction, and pattern recognition. The ability to graphically represent conditional dependencies and independencies among random variables may also be useful in credit scoring. Although several papers have already appeared in the literature which use graphical models for model identification, as far as we know there have been no explicit experimental results that compare a traditionally computed risk score with predictions based on Bayesian learning algorithms. In this paper, we examine a database of credit-card applicants and attempt to ‘learn’ the graphical structure of the characteristics or variables that make up the database. We identify representative Bayesian networks in a development sample as well as the associated Markov blankets and clique structures within the Markov blanket. Once we obtain the structure of the underlying conditional independencies, we are able to estimate the probabilities of each node conditional on its direct predecessor node(s). We then calculate the posterior probabilities and scores of a performance variable for the development sample. Finally, we calculate the receiver operating characteristic (ROC) curves and relative profitability of scorecards based on these identifications. The results of the different models and methods are compared with both development and validation samples. Finally, we report on a statistical entropy calculation that measures the degree to which cliques identified in the Bayesian network are independent of one another.