Article ID: | iaor20165048 |
Volume: | 33 |
Issue: | 6 |
Start Page Number: | 592 |
End Page Number: | 606 |
Publication Date: | Dec 2016 |
Journal: | Expert Systems |
Authors: | Gokhale Swapna S, Doran Derek |
Keywords: | robots, traffic management |
Recent academic and industry reports confirm that web robots dominate the traffic seen by web servers across the Internet. Because web robots crawl in an unregulated fashion, they may threaten the privacy, function, performance, and security of web servers. There is therefore a growing need to be able to identify robot visitors automatically, in offline and in real time, to assess their impact and to potentially protect web servers from abusive bots. Yet contemporary detection approaches, which rely on syntactic log analysis, finding statistical variations between robot and human traffic, analytical learning techniques, or complex software modifications may not be realistic to implement or remain effective as the behavior of robots evolve over time. Instead, this paper presents a novel detection approach that relies on the differences in the resource request patterns of web robots and humans. It rationalizes why differences in resource request patterns are expected to remain intrinsic to robots and humans despite the continuous evolution of their traffic. The performance of the approach, adoptable for both offline and real time settings with a simple implementation, is demonstrated by playing back streams of actual web traffic with varying session lengths and proportions of robot requests.