Cache-Oblivious Hashing

0.00 Avg rating—0 Votes

Article ID:	iaor2014935
Volume:	69
Issue:	4
Start Page Number:	864
End Page Number:	883
Publication Date:	Aug 2014
Journal:	Algorithmica
Authors:	Zhang Qin, Yi Ke, Pagh Rasmus, Wei Zhewei
Keywords:	datamining, matrices

Abstract:

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2^Ω(b) disk accesses for any load factor α bounded away from 1. However, such near‐perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache‐oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware‐specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache‐oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547–558, 2011) achieves t q =1+1/2^Ω(b), thus matching the cache‐aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache‐oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.

Reviews

Required fields are marked *. Your email address will not be published.