With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.
翻译:随着在线平台的激增,在这些平台上,用户通过评论和反应参与到这些平台上的情况急剧增加。这些文字评论中有很大一部分是滥用、粗鲁和冒犯观众的。随着机器学习系统到位以检查这些评论进入平台,培训数据中存在的偏见被传递到分类器上,导致对一系列类别、宗教和性别的歧视。在这项工作中,我们评估了不同的分类和特征,以估计这些分类器中的偏向,同时评估其毒性分类的下游工作表现。结果显示,自动毒性评论检测模型的性能的改进与减少这些模型中的偏差有着积极的相关性。在我们的工作中,关注机制证明比CNN模型是一种更好的建模战略。进一步分析表明,快速文本嵌入比将手套嵌入用于毒性评论检测培训模型中略略好。更深入的分析揭示了这样的结果,即这些自动模型特别偏向特定的身份群体,即使该模型具有高的ACUC分。最后,为了减少毒性检测模型中的偏差,为减轻这些模型的偏差,一个多任务设置了多任务,与减轻这些模型中的偏差。在我们的工作中,关注机制被证明比CNNM模型是一种更好的模式更好的建模战略。进一步分析表明,快速嵌略略比手套嵌略略略至A%。