Article ID: | iaor20172285 |
Volume: | 59 |
Issue: | 2 |
Start Page Number: | 187 |
End Page Number: | 194 |
Publication Date: | Jun 2017 |
Journal: | Australian & New Zealand Journal of Statistics |
Authors: | Lumley Thomas |
Keywords: | statistics: sampling, simulation |
Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R2 are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R.