Article ID: | iaor201526790 |
Volume: | 66 |
Issue: | 8 |
Start Page Number: | 1385 |
End Page Number: | 1398 |
Publication Date: | Aug 2015 |
Journal: | Journal of the Operational Research Society |
Authors: | Yum Bong-Jin, Jeong Myong K, Hwang Sangheum, Kim Dohyun |
Keywords: | heuristics, matrices |
The kernel‐based regression (KBR) method, such as support vector machine for regression (SVR) is a well‐established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide‐scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y‐space outliers and, consequently, are sensitive to X‐space outliers. As a result, even a single anomalous outlying observation in X‐space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y‐space and X‐space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M‐estimator (GM‐estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel‐induced feature space are used as leverage measures to downweight the effects of potential X‐space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.