Article ID: | iaor2001575 |
Volume: | 10 |
Issue: | 2 |
Start Page Number: | 111 |
End Page Number: | 126 |
Publication Date: | May 1998 |
Journal: | Computational Optimization and Applications |
Authors: | Bredensteiner E.J., Bennett K.P. |
Keywords: | programming: parametric |
Decision trees for classification can be constructed using mathematical programming. Within decision tree algorithms, the feature minimization problem is to construct accurate decisions using as few features or attributes within each decision as possible. Feature minimization is an important aspect of data mining since it helps identify what attributes are important and helps produce accurate and interpretable decision trees. In feature minimization with bounded accuracy, we minimize the number of features using a given misclassification error tolerance. This problem can be formulated as a parametric bilinear program and is shown to be NP-complete. A parametric Frank–Wolfe method is used to solve the bilinear subproblems. The resulting minimization algorithm produces more compact, accurate, and interpretable trees. This procedure can be applied to many different error functions. Formulations and results for two error functions are given. One method, FM_RLP-P, dramatically reduced the number of features of one dataset from 147 to 2 while maintaining an 83.6% testing accuracy. Computational results compare favorably with the standard univariate decision tree method, C4.5, as well as with linear programming methods of tree construction.