The proliferation of online hate speech has necessitated the creation of algorithms which can detect toxicity. Most of the past research focuses on this detection as a classification task, but assigning an absolute toxicity label is often tricky. Hence, few of the past works transform the same task into a regression. This paper shows the comparative evaluation of different transformers and traditional machine learning models on a recently released toxicity severity measurement dataset by Jigsaw. We further demonstrate the issues with the model predictions using explainability analysis.
翻译:在线仇恨言论的泛滥使得有必要创建能够检测毒性的算法。 过去的研究大多侧重于将检测作为一种分类任务,但指定绝对毒性标签往往很困难。 因此,过去的工作很少将同一任务转化为回归。 本文展示了对不同变压器和传统机器学习模型的比较评价,这些变压器和传统机器学习模型都来自吉格肖最近发布的毒性严重程度测量数据集。 我们还用可解释性分析进一步展示了模型预测中存在的问题。