This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification
翻译:本文提出了一个基于深度学习的流程,用于将孟加拉有毒评论分类,其中首先使用二元分类模型确定评论是否有毒,然后使用多标签分类器确定评论属于哪种毒性类型。为此,我们准备了一个手动标记的数据集,共包含16,073个实例,其中8,488个有毒,任何有毒评论可能同时对应于六种有毒类别之一或多个 - 粗俗的、仇恨的、宗教的、威胁的、喷子的和侮辱性的。在二元分类任务中,LSTM与BERT嵌入实现了89.42%的准确率,而作为多标签分类器,使用卷积神经网络和双向长短期记忆(CNN-BiLSTM)和注意机制的组合实现了78.92%的准确率,0.86的加权F1分数。为了解释所提出模型的预测和解释分类时的单词特征重要性,我们采用了局部可解释模型无关解释(LIME)框架。我们已将数据集公开,并可在以下网址获取:https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification