Article ID: | iaor20103436 |
Volume: | 61 |
Issue: | 5 |
Start Page Number: | 804 |
End Page Number: | 812 |
Publication Date: | May 2010 |
Journal: | Journal of the Operational Research Society |
Authors: | Glen J J, Falangis K |
Keywords: | heuristics, programming: integer |
In developing a classification model for assigning observations of unknown class to one of a number of specified classes using the values of a set of features associated with each observation, it is often desirable to base the classifier on a limited number of features. Mathematical programming discriminant analysis methods for developing classification models can be extended for feature selection. Classification accuracy can be used as the feature selection criterion by using a mixed integer programming (MIP) model in which a binary variable is associated with each training sample observation, but the binary variable requirements limit the size of problems to which this approach can be applied. Heuristic feature selection methods for problems with large numbers of observations are developed in this paper. These heuristic procedures, which are based on the MIP model for maximizing classification accuracy, are then applied to three credit scoring data sets.