The web bots have been blamed for consuming large amount of Internet traffic and undermining the interest of the scraped sites for years. Traditional bot detection studies focus mainly on signature-based solution, but advanced bots usually forge their identities to bypass such detection. With increasing cloud migration, cloud providers provide new opportunities for an effective bot detection based on big data to solve this issue. In this paper, we present a behavior-based bot detection scheme called BotGraph that combines sitemap and convolutional neural network (CNN) to detect inner behavior of bots. Experimental results show that BotGraph achieves ~95% recall and precision on 35-day production data traces from different customers including the Bing search engine and several sites.
翻译:网络机器人多年来一直被指责消费大量互联网流量和破坏废弃网站的兴趣。传统的机器人检测研究主要侧重于基于签名的解决方案,但先进的机器人通常伪造身份来绕过这种检测。随着云层迁移的增加,云源提供商为基于大数据的有效机器人检测以解决该问题提供了新的机会。在本文中,我们提出了一个名为博特格夫的基于行为的机器人检测计划,将网站映射和聚合神经网络(CNN)结合起来,以检测机器人的内部行为。实验结果表明,博特格拉夫在35天的生产数据记录上实现了~95%的回溯率和精确度,这些数据来自不同的客户,包括宾搜索引擎和多个网站。