Optimization-based feature selection with adaptive instance sampling

0.00 Avg rating—0 Votes

Article ID:	iaor20072633
Country:	United Kingdom
Volume:	33
Issue:	11
Start Page Number:	3088
End Page Number:	3106
Publication Date:	Nov 2006
Journal:	Computers and Operations Research
Authors:	Yang Jaekyung, Olafsson Sigurdur
Keywords:	heuristics, datamining

Abstract:

Preprocessing the data to filter out redundant and irrelevant features is one of the most important steps in the data mining process. Careful feature selection may improve both the computational time of inducing subsequent models and the quality of those models. Using fewer features often leads to simpler and easier to interpret models, and selecting important feature can lead to important insights into the application. The feature selection problem is inherently a combinatorial optimization problem. This paper builds on a metaheuristic called the nested partitions method that has been shown to be particularly effective for the feature selection problem. Specifically, we focus on the scalability of the method and show that its performance is vastly improved by incorporating random sampling of instances. Furthermore, we develop an adaptive variant of the algorithm that dynamically determines the required sample rate. The adaptive algorithm is shown to perform very well when applied to a set of standard test problems.

Reviews

Required fields are marked *. Your email address will not be published.