Article ID: | iaor20161493 |
Volume: | 32 |
Issue: | 2 |
Start Page Number: | 167 |
End Page Number: | 195 |
Publication Date: | May 2016 |
Journal: | Computational Intelligence |
Authors: | Smith Michael R, Martinez Tony |
Keywords: | datamining, learning, artificial intelligence |
Not all instances in a data set are equally beneficial for inferring a model of the data, and some instances (such as outliers) can be detrimental. Several machine learning techniques treat the instances in a data set differently during training such as curriculum learning, filtering, and boosting. However, it is difficult to determine how beneficial an instance is for inferring a model of the data. In this article, we present an automated method that orders the instances in a data set by complexity based on their likelihood of being misclassified (instance hardness) for supervised classification problems that generates a hardness ordering. The underlying assumption of this method is that instances with a high likelihood of being misclassified represent more complex concepts in a data set. Using a hardness ordering allows a learning algorithm to focus on the most beneficial instances. We integrate a hardness ordering into the learning process using curriculum learning, filtering, and boosting. We find that focusing on the simpler instances during training significantly increases generalization accuracy. Also, the effects of curriculum learning depend on the learning algorithm that is used. In general, filtering and boosting outperform curriculum learning, and filtering has the most significant effect on accuracy.