Article ID: | iaor20081468 |
Country: | United Kingdom |
Volume: | 5 |
Issue: | 4 |
Start Page Number: | 447 |
End Page Number: | 473 |
Publication Date: | Dec 2006 |
Journal: | Journal of Mathematical Modelling and Algorithms |
Authors: | Xu Lei, An Yujia, Hu Xuelei |
With uncorrelated Gaussian factors extended to mutually independent factors beyond Gaussian, the conventional factor analysis is extended to what is recently called independent factor analysis. Typically, it is called binary factor analysis (BFA) when the factors are binary and called non-Gaussian factor analysis (NFA) when the factors are from real non-Gaussian distributions. A crucial issue in both BFA and NFA is the determination of the number of factors. In the literature of statistics, there are a number of model selection criteria that can be used for this purpose. Also, the Bayesian Ying–Yang (BYY) harmony learning provides a new principle for this purpose. This paper further investigates BYY harmony learning in comparison with existing typical criteria, including Akaike's information criterion (AIC), the consistent Akaike's information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion on selection of the number of factors. This comparative study is made via experiments on the data sets with different sample sizes, data space dimensions, noise variances, and hidden factors numbers. Experiments have shown that for both BFA and NFA, in most cases BIC outperforms AIC, CAIC, and CV while the BYY criterion is either comparable with or better than BIC. In consideration of the fact that the selection by these criteria has to be implemented at the second stage based on a set of candidate models which have to be obtained at the first stage of parameter learning, while BYY harmony learning can provide not only a new class of criteria implemented in a similar way but also a new family of algorithms that perform parameter learning at the first stage with automated model selection, BYY harmony learning is more preferred since computing costs can be saved significantly.