Article ID: | iaor2016402 |
Volume: | 32 |
Issue: | 1 |
Start Page Number: | 102 |
End Page Number: | 126 |
Publication Date: | Feb 2016 |
Journal: | Computational Intelligence |
Authors: | Duan Lei, Dong Guozhu, Wang Xianming, Tang Changjie |
Keywords: | datamining, statistics: regression |
Contrast patterns describe differences between two or more data sets or data classes; they have been proven to be useful for solving many kinds of problems, such as building accurate classifiers, defining clustering quality measures, and analyzing disease subtypes. This article investigates the mining of a new kind of contrast patterns, namely discriminating inter‐attribute functions (DIFs), which represent arithmetic‐expression‐based inter‐attribute relationships that distinguish classes of data. DIFs are an expressive and practical alternative of item‐based contrast patterns and can express discriminating relationships such as ‘weight/(height)2 is more likely to be ≤25 in one class than in another class.’ Besides introducing the DIF mining problem, this article makes theoretical and algorithmic contributions on the problem. We prove that DIF mining is MAX SNP‐hard. Regarding how to efficiently mine DIFs, we present a set of rules to prune the search space of arithmetic expressions by eliminating redundant ones (equivalent to some others). We give two algorithms: one for finding all DIFs satisfying given thresholds and another for finding certain optimal DIFs using genetic computation techniques. The former is useful when the number of attributes is small, whereas the latter is useful when that number is large; both use the redundant arithmetic‐expression pruning rules. A performance study shows that our techniques are effective and efficient for finding DIFs.