Optimal robot scheduling for Web search engines

Optimal robot scheduling for Web search engines

0.00 Avg rating0 Votes
Article ID: iaor2000821
Country: United Kingdom
Volume: 1
Issue: 1
Start Page Number: 15
End Page Number: 29
Publication Date: Jun 1998
Journal: Journal of Scheduling
Authors: , ,
Keywords: search, communications, computers: information
Abstract:

A robot is deployed by a Web search engine in order to maintain the currency of its data base of Web pages. This paper studies robot scheduling policies that minimize the fractions ri of time pages spend out-of-date, assuming independent Poisson page-change processes, and a general distribution for the page access time X. We show that, if X is decreased in the increasing convex ordering sense then ri is decreased for all i under any scheduling policy, and that, in order to minimize expected total obsolescence time of any page, the accesses to that page should be as evenly spaced in time as possible. We then investigate the problem of scheduling to minimize the cost function Σciri, where the ci are given weights proportional to the page-change rates μi. We give a tight bound on the performance of such a policy and prove that the optimal frequency at which the robot should access page i is proportional to ln(hi)–1, where hi := Ee–μiX. Note that this reduces to μi when X is a constant, but not, as one might expect, when X has a general distribution. Next, we evaluate randomized accessing policies whereby the choices of page access are determined by independent random samples from the distribution {fi}. We show that when the weights ci in the cost function are proportional to μi, the minimum cost is achieved when fi is proportional to (hi)–1 – 1. Finally, we present and analyze a heuristic policy that is especially suited to the asymptotic regime of large data bases.

Reviews

Required fields are marked *. Your email address will not be published.