Harassment by cyberbullies is a significant phenomenon on the social media. Existing works for cyberbullying detection have at least one of the following three bottlenecks. First, they target only one particular social media platform (SMP). Second, they address just one topic of cyberbullying. Third, they rely on carefully handcrafted features of the data. We show that deep learning based models can overcome all three bottlenecks. Knowledge learned by these models on one dataset can be transferred to other datasets. We performed extensive experiments using three real-world datasets: Formspring (12k posts), Twitter (16k posts), and Wikipedia(100k posts). Our experiments provide several useful insights about cyberbullying detection. To the best of our knowledge, this is the first work that systematically analyzes cyberbullying detection on various topics across multiple SMPs using deep learning based models and transfer learning.
翻译:网络欺凌的骚扰是社交媒体上的一个重要现象。 现有的网络欺凌检测工作至少有以下三个瓶颈之一。 首先, 它们只针对一个特定的社交媒体平台( SMP ) 。 其次, 它们只针对网络欺凌问题。 第二, 它们只涉及一个网络欺凌问题 。 第三, 它们依靠的是精心手工制作的数据特征 。 我们显示, 深层次的学习模型可以克服所有三个瓶颈 。 这些模型在一个数据集上获得的知识可以转移到其他数据集 。 我们利用三个真实世界数据集进行了广泛的实验: 形式( 12k 号 ) 、 Twitter ( 16k 号 ) 、 维基百科( 100k 号 ) 。 我们的实验就网络欺凌探测问题提供了一些有用的洞察力。 对于我们所知的最好来说, 这是我们利用深层次的学习模型和传输学习来系统分析多个 SMP 不同专题的网络欺凌探测的第一个工作 。