Article ID: | iaor2009730 |
Country: | Cuba |
Volume: | 28 |
Issue: | 2 |
Start Page Number: | 132 |
End Page Number: | 143 |
Publication Date: | May 2007 |
Journal: | Revista de Investigacin Operacional |
Authors: | Vigneron Vincent |
In this paper we study the notion of entropy for a set of attributes of a table and propose a novel method to measure the dissimilarity of categorical data. Experiments show that our estimation method improves the accuracy of the popular unsupervised Self Organized Map (SOM), in comparison to Euclidean or Mahalanobis distance. The distance comparison is applied for clustering of multidimensional contingency tables. Two factors make our distance function attractive: first, the general framework which can be extended to other class of problems; second, we may normalize this measure in order to obtain a coefficient similar for instance to the Pearson's coefficient of contingency.