Article ID: | iaor1998521 |
Country: | Netherlands |
Volume: | 71 |
Issue: | 1 |
Start Page Number: | 199 |
End Page Number: | 228 |
Publication Date: | Aug 1997 |
Journal: | Annals of Operations Research |
Authors: | Rho Sangkyu, March Salvatore T. |
Keywords: | heuristics, optimization |
Optimizing join queries is a major problem in distributed database systems, particularly when files are replicated and copies stored at different nodes in the network. A distributed query optimization algorithm must select file copies and determine how and where those files will be processed. Process decisions include which files to reduce via semijoins, if any, the sites at which to perform join operations, and the order in which to perform those join operations. We extend the scope of distributed query optimization research by developing a model that, for the first time, includes all of these design decisions and considers both communication and local processing costs. We develop a genetic algorithm-based solution procedure for this model which quickly determines efficient query processing plans. We demonstrate that ignoring local processing costs or restricting join processing to the result site, as commonly done in prior research, can result in inefficient query execution plans.