An integrated method for real time and offline web robot detection

An integrated method for real time and offline web robot detection

0.00 Avg rating0 Votes
Article ID: iaor20165048
Volume: 33
Issue: 6
Start Page Number: 592
End Page Number: 606
Publication Date: Dec 2016
Journal: Expert Systems
Authors: ,
Keywords: robots, traffic management
Abstract:

Recent academic and industry reports confirm that web robots dominate the traffic seen by web servers across the Internet. Because web robots crawl in an unregulated fashion, they may threaten the privacy, function, performance, and security of web servers. There is therefore a growing need to be able to identify robot visitors automatically, in offline and in real time, to assess their impact and to potentially protect web servers from abusive bots. Yet contemporary detection approaches, which rely on syntactic log analysis, finding statistical variations between robot and human traffic, analytical learning techniques, or complex software modifications may not be realistic to implement or remain effective as the behavior of robots evolve over time. Instead, this paper presents a novel detection approach that relies on the differences in the resource request patterns of web robots and humans. It rationalizes why differences in resource request patterns are expected to remain intrinsic to robots and humans despite the continuous evolution of their traffic. The performance of the approach, adoptable for both offline and real time settings with a simple implementation, is demonstrated by playing back streams of actual web traffic with varying session lengths and proportions of robot requests.

Reviews

Required fields are marked *. Your email address will not be published.