Article ID: | iaor1999833 |
Country: | Netherlands |
Volume: | 94 |
Issue: | 2 |
Start Page Number: | 405 |
End Page Number: | 418 |
Publication Date: | Oct 1996 |
Journal: | European Journal of Operational Research |
Authors: | Koehler Gary J., Kim Hyunsoo |
Empirical studies have shown that the performance of decision tree induction usually improves when the trees are pruned. Whether these results hold in general and to what extent pruning improves the accuracy of a concept have not been investigated theoretically. This paper provides a theoretical study of pruning. We focus on a particular type of pruning and determine a bound on the error due to pruning. This is combined with PAC (probably approximately correct) learning theory to determine a sample size sufficient to guarantee a probabilistic bound on the concept error. We also discuss additional pruning rules and give an analysis for the pruning error.