Classifying highly imbalanced ICU data

0.00 Avg rating—0 Votes

Article ID:	iaor20132783
Volume:	16
Issue:	2
Start Page Number:	119
End Page Number:	128
Publication Date:	Jun 2013
Journal:	Health Care Management Science
Authors:	Roumani Yazan, May Jerrold, Strum David, Vargas Luis
Keywords:	datamining

Abstract:

Highly imbalanced data sets are those where the class of interest is rare. In this paper, we compare the performance of several common data mining methods, logistic regression, discriminant analysis, Classification and Regression Tree (CART) models, C5, and Support Vector Machines (SVM) in predicting the discharge status (alive or deceased, with ‘deceased’ being the class of interest) of patients from an Intensive Care Unit (ICU). Using a variety of misclassification cost ratio (MCR) values and using specificity, recall, precision, the F‐measure, and confusion entropy (CEN) as criteria for evaluating each method’s performance, C5 and SVM performed better than the other methods. At a MCR of 100, C5 had the highest recall and SVM the highest specificity and lowest CEN. We also used Hand’s measure to compare the five methods. According to Hand’s measure, logistic regression performed the best. This article makes several contributions. We show how the use of MCR for analyzing imbalanced medical data significantly improves the method’s classification performance. We also found that the F‐measure and precision did not improve as the MCR was increased.

Reviews

Required fields are marked *. Your email address will not be published.