Article ID: | iaor20105164 |
Volume: | 27 |
Issue: | 3 |
Start Page Number: | 201 |
End Page Number: | 218 |
Publication Date: | Jul 2010 |
Journal: | Expert Systems |
Authors: | Wu Tai-Hsi, Yeh Jinn-Yi |
Keywords: | heuristics: genetic algorithms |
Cancer classification, through gene expression data analysis, has produced remarkable results, and has indicated that gene expression assays could significantly aid in the development of efficient cancer diagnosis and classification platforms. However, cancer classification, based on DNA array data, remains a difficult problem. The main challenge is the overwhelming number of genes relative to the number of training samples, which implies that there are a large number of irrelevant genes to be dealt with. Another challenge is from the presence of noise inherent in the data set. It makes accurate classification of data more difficult when the sample size is small. We apply genetic algorithms (GAs) with an initial solution provided by t statistics, called t-GA, for selecting a group of relevant genes from cancer microarray data. The decision-tree-based cancer classifier is built on the basis of these selected genes. The performance of this approach is evaluated by comparing it to other gene selection methods using publicly available gene expression data sets. Experimental results indicate that t-GA has the best performance among the different gene selection methods. The Z-score figure also shows that some genes are consistently preferentially chosen by t-GA in each data set.