Article ID: | iaor20162339 |
Volume: | 64 |
Issue: | 3 |
Start Page Number: | 865 |
End Page Number: | 880 |
Publication Date: | Jul 2016 |
Journal: | Computational Optimization and Applications |
Authors: | Yoshise Akiko, Miyashiro Ryuhei, Takano Yuichi, Sato Toshiki |
Keywords: | statistics: regression, information, programming: integer, heuristics |
This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness‐of‐fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.