Article ID: | iaor200969059 |
Country: | United Kingdom |
Volume: | 60 |
Issue: | 8 |
Start Page Number: | 1069 |
End Page Number: | 1084 |
Publication Date: | Aug 2009 |
Journal: | Journal of the Operational Research Society |
Authors: | Yang J, lafsson S, Kim J |
Keywords: | statistics: multivariate |
Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger.