The rise of online aggression on social media is evolving into a major point of concern. Several machine and deep learning approaches have been proposed recently for detecting various types of aggressive behavior. However, social media are fast paced, generating an increasing amount of content, while aggressive behavior evolves over time. In this work, we introduce the first, practical, real-time framework for detecting aggression on Twitter via embracing the streaming machine learning paradigm. Our method adapts its ML classifiers in an incremental fashion as it receives new annotated examples and is able to achieve the same (or even higher) performance as batch-based ML models, with over 90% accuracy, precision, and recall. At the same time, our experimental analysis on real Twitter data reveals how our framework can easily scale to accommodate the entire Twitter Firehose (of 778 million tweets per day) with only 3 commodity machines. Finally, we show that our framework is general enough to detect other related behaviors such as sarcasm, racism, and sexism in real time.
翻译:社交媒体在线攻击的上升正在演变成一个主要关注点。最近提出了几种机器和深层次的学习方法,以发现各种类型的攻击行为。然而,社交媒体的速度很快,产生了越来越多的内容,而侵略行为随时间演变。在这项工作中,我们引入了第一个实用的实时框架,通过采纳流机学习模式在Twitter上发现攻击行为。我们的方法以渐进的方式调整了ML分类者,因为它收到了新的附加说明的例子,并且能够实现90%以上的批量ML模型的同样(甚至更高)性能。与此同时,我们对真实的推特数据的实验分析揭示了我们的框架如何能够轻松地容纳整个Twitter Firehose(每天7.78亿次推特),只有3台商品机器。最后,我们展示了我们的框架非常笼统,足以实时检测到其他相关行为,如讽刺、种族主义和性别主义。