Article ID: | iaor201112180 |
Volume: | 65 |
Issue: | 4 |
Start Page Number: | 371 |
End Page Number: | 386 |
Publication Date: | Nov 2011 |
Journal: | Statistica Neerlandica |
Authors: | Swartz Tim B |
Keywords: | statistics: distributions, stochastic processes, simulation, probability |
Traditional clustering algorithms are deterministic in the sense that a given dataset always leads to the same output partition. This article modifies traditional clustering algorithms whereby data are associated with a probability model, and clustering is carried out on the stochastic model parameters rather than the data. This is done in a principled way using a Bayesian approach which allows the assignment of posterior probabilities to output partitions. In addition, the approach incorporates prior knowledge of the output partitions using Bayesian melding. The methodology is applied to two substantive problems: (i) a question of stylometry involving a simulated dataset and (ii) the assessment of potential champions of the 2010 FIFA World Cup.