Article ID: | iaor20073174 |
Country: | United States |
Volume: | 51 |
Issue: | 6 |
Start Page Number: | 894 |
End Page Number: | 907 |
Publication Date: | Nov 2003 |
Journal: | Operations Research |
Authors: | Wasil Edward, Golden Bruce, Raghavan S., Fu Zhiwei, Lele Shreevardhan |
Keywords: | datamining, heuristics: genetic algorithms |
When considering a decision tree for the purpose of classification, accuracy is usually the sole performance measure used in the construction process. In this paper, we introduce the idea of combining a decision tree's expected value and variance in a new probabilistic measure for assessing the performance of a tree. We develop a genetic algorithm for constructing a tree using our new measure and conduct computational experiments that show the advantages of our approach. Further, we investigate the effect of introducing diversity into the population used by our genetic algorithm. We allow the genetic algorithm to simultaneously focus on two distinct probabilistic measures – one that is risk averse and one that is risk seeking. Our bivariate genetic algorithm for constructing a decision tree performs very well, scales up quite nicely to handle data sets with hundreds of thousands of points, and requires only a small percent of the data to generate a high-quality decision tree. We demonstrate the effectiveness of our algorithm on three large data sets.