The spread of hate speech on social media space is currently a serious issue. The undemanding access to the enormous amount of information being generated on these platforms has led people to post and react with toxic content that originates violence. Though efforts have been made toward detecting and restraining such content online, it is still challenging to identify it accurately. Deep learning based solutions have been at the forefront of identifying hateful content. However, the factors such as the context-dependent nature of hate speech, the intention of the user, undesired biases, etc. make this process overcritical. In this work, we deeply explore a wide range of challenges in automatic hate speech detection by presenting a hierarchical organization of these problems. We focus on challenges faced by machine learning or deep learning based solutions to hate speech identification. At the top level, we distinguish between data level, model level, and human level challenges. We further provide an exhaustive analysis of each level of the hierarchy with examples. This survey will help researchers to design their solutions more efficiently in the domain of hate speech detection.
翻译:目前,在社交媒体空间散布仇恨言论是一个严重问题。这些平台上产生的大量信息的不必要获取导致人们以引发暴力的有毒内容发布和反应。虽然已经作出努力在网上发现和限制此类内容,但准确识别仍是一项挑战。深层次的学习解决方案一直是识别仇恨内容的最前沿。然而,仇恨言论的因地制宜性质、用户的意图、不受欢迎的偏见等因素使这一进程变得过于关键。在这项工作中,我们深刻探索了自动识别仇恨言论方面的广泛挑战,展示了这些问题的等级组织。我们侧重于机器学习或基于深层学习的识别仇恨言论解决方案所面临的挑战。在高层,我们区分数据水平、模式水平和人层面的挑战。我们进一步用实例对各级层次进行详尽的分析。这项调查将有助于研究人员在识别仇恨言论方面更有效地设计解决方案。