| Article ID: | iaor200969059 |
| Country: | United Kingdom |
| Volume: | 60 |
| Issue: | 8 |
| Start Page Number: | 1069 |
| End Page Number: | 1084 |
| Publication Date: | Aug 2009 |
| Journal: | Journal of the Operational Research Society |
| Authors: | Yang J, lafsson S, Kim J |
| Keywords: | statistics: multivariate |
Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger.