Article ID: | iaor20052650 |
Country: | Netherlands |
Volume: | 7 |
Issue: | 3 |
Start Page Number: | 163 |
End Page Number: | 171 |
Publication Date: | Jul 2004 |
Journal: | Health Care Management Science |
Authors: | Sibbritt David, Gibberd Robert |
Keywords: | decision theory |
Very large datasets typically consist of millions of records, with many variables. Such datasets are stored and maintained by organizations because of the perceived potential information they may contain. However, the problem with very large datasets is that traditional methods of data minig are not capable of retrieving this information because the software may be overwhelmed by the memory or computing requirements. In this article we outline a method that can analyze very large datasets. The method initially performs a data reduction step through the use of a summary table, which is then used as a reference dataset that is recursively partitioned to grow a decision tree.