Article ID: | iaor201523793 |
Volume: | 30 |
Issue: | 1 |
Start Page Number: | 97 |
End Page Number: | 109 |
Publication Date: | Feb 2014 |
Journal: | Quality and Reliability Engineering International |
Authors: | Birch Jeffrey B, Lawrence David E, Chen Yajuan |
Keywords: | case studies, cluster analysis, Monte Carlo method |
A regression methodology is introduced that obtains competitive, robust, efficient, high-breakdown regression parameter estimates as well as providing an informative summary regarding possible multiple outlier structure. The proposed method blends a cluster analysis phase with a controlled bounded influence (BI) regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster ‘half-set’ of observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a BI regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stability, scale estimation and standard errors. The dendrogram of the clustering process and the weight plot are graphical displays available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical view point towards data analysis and the marriage of estimation with diagnostic summary.