Article ID: | iaor201112557 |
Volume: | 38 |
Issue: | 2 |
Start Page Number: | 359 |
End Page Number: | 376 |
Publication Date: | Jun 2011 |
Journal: | Scandinavian Journal of Statistics |
Authors: | Favaro Stefano, Prnster Igor, Walker Stephen G |
Keywords: | statistics: general, statistics: distributions, statistics: inference, statistics: sampling |
In this study, we investigate a recently introduced class of non-parametric priors, termed generalized Dirichlet process priors. Such priors induce (exchangeable random) partitions that are characterized by a more elaborate clustering structure than those arising from other widely used priors. A natural area of application of these random probability measures is represented by species sampling problems and, in particular, prediction problems in genomics. To this end, we study both the distribution of the number of distinct species present in a sample and the distribution of the number of new species conditionally on an observed sample. We also provide the Bayesian Non-parametric estimator for the number of new species in an additional sample of given size and for the discovery probability as function of the size of the additional sample. Finally, the study of its conditional structure is completed by the determination of the posterior distribution.