This paper investigates the use of machine learning models for the classification of unhealthy online conversations containing one or more forms of subtler abuse, such as hostility, sarcasm, and generalization. We leveraged a public dataset of 44K online comments containing healthy and unhealthy comments labeled with seven forms of subtle toxicity. We were able to distinguish between these comments with a top micro F1-score, macro F1-score, and ROC-AUC of 88.76%, 67.98%, and 0.71, respectively. Hostile comments were easier to detect than other types of unhealthy comments. We also conducted a sentiment analysis which revealed that most types of unhealthy comments were associated with a slight negative sentiment, with hostile comments being the most negative ones.
翻译:本文调查了使用机器学习模型对不健康的在线对话进行分类的情况,其中包括一种或多种形式的隐蔽的虐待,如敌意、讽刺和泛泛化。我们利用了44K的公开在线评论数据集,其中载有健康和不健康的评论,并贴上了七种隐蔽毒性的标签。我们可以将这些评论区分开来,前者是顶尖的F1-分数,后者是宏观F1-分数,而后者是88.76%、67.98%和0.71的ROC-AUC。敌意的评论比其他类型的不健康的评论更容易发现。我们还进行了情绪分析,发现大多数不健康的评论都与轻微的负面情绪有关,而敌对的评论是最负面的。