Article ID: | iaor20072357 |
Country: | Netherlands |
Volume: | 148 |
Issue: | 1 |
Start Page Number: | 189 |
End Page Number: | 201 |
Publication Date: | Nov 2006 |
Journal: | Annals of Operations Research |
Authors: | Hammer Peter L., Alexe Sorin, Alexe Gabriela, Vizvari Bela |
A major difficulty in bioinformatics is due to the size of the datasets, which contain frequently large numbers of variables. In this study, we present a two-step procedure for feature selection. In a first ‘filtering’ stage, a relatively small subset of features is identified on the basis of several criteria. In the second stage, the importance of the selected variables is evaluated based on the frequency of their participation in relevant patterns and low impact variables are eliminated. This step is applied iteratively, until arriving to a Pareto-optimal ‘support set’, which balances the conflicting criteria of simplicity and accuracy.