Article ID: | iaor201110889 |
Volume: | 52 |
Issue: | 1 |
Start Page Number: | 82 |
End Page Number: | 94 |
Publication Date: | Dec 2011 |
Journal: | Decision Support Systems |
Authors: | Kim Myoung Ho, Son Jin Hyun, Roh Yohan J, Kim Jae Ho |
Keywords: | datamining, statistics: general, decision: applications, statistics: decision |
Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q‐Histogram, based on the use of the quad‐tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad‐tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew‐tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real‐life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency.