Rules for comparing predictive data mining algorithms by error rate

0.00 Avg rating—0 Votes

Article ID:	iaor20052316
Country:	India
Volume:	41
Issue:	3
Start Page Number:	178
End Page Number:	187
Publication Date:	Sep 2004
Journal:	OPSEARCH
Authors:	Despotis D.K., Koliastasis D.
Keywords:	datamining

Abstract:

In knowledge discovery from databases and warehouses, there is an ongoing number of algorithms and practices that can be used for the very same application, for example to predict the value of a specific field. These algorithms are trained on a portion of the original data set and are tested on the remaining data set for their accuracy. Furthermore, the estimated value of the target field is often tested against a second data set for evaluation purposes. This paper examines the factors affecting the performance, as it is defined by the produced error rate, of some popular predictive data mining algorithms such as decision trees, neural nets, regression, etc., on many data sets from different sources. These factors may be either the number of attributes, the type of each field, the number of missing values, etc. Finally, it is tested whether it is possible to gauge a priori which algorithm(s) will produce the lowest error rate for each specific data set. As a result some heuristic rules are to be listed in order to facilitate the decision maker in selecting the best possible technique.

Reviews

Required fields are marked *. Your email address will not be published.