Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently that it has been shown that existing models generalise poorly to unseen data. This survey paper attempts to summarise how generalisable existing hate speech detection models are, reason why hate speech models struggle to generalise, sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.
翻译:仇恨言论是一种有害的在线内容,直接攻击或煽动对一个群体或个人成员的仇恨,其依据是其身份的实际或感知方面,如族裔、宗教和性取向。随着在线仇恨言论的上升,其自动发现是一项自然语言处理任务,人们越来越感兴趣。然而,直到最近,人们才发现现有模式对隐蔽数据没有很好地加以概括。这份调查文件试图总结现有的仇恨言论检测模式如何普遍化,为什么仇恨言论模式难以概括,如何总结解决主要障碍的现有尝试,然后提出未来研究方向,以改进仇恨言论检测的普及性。