深度学习进行可解释的多标记孟加拉语有毒评论分类 (Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning) - 专知论文

会员服务 ·

0

多标记 · 多标签分类 · 分类器 · 准确率 · 双向长短期记忆 ·

2023 年 4 月 8 日

Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

翻译：深度学习进行可解释的多标记孟加拉语有毒评论分类

Tanveer Ahmed Belal,G. M. Shahariar,Md. Hasanul Kabir

This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification

翻译：本文提出了一个基于深度学习的流程，用于将孟加拉有毒评论分类，其中首先使用二元分类模型确定评论是否有毒，然后使用多标签分类器确定评论属于哪种毒性类型。为此，我们准备了一个手动标记的数据集，共包含16,073个实例，其中8,488个有毒，任何有毒评论可能同时对应于六种有毒类别之一或多个 - 粗俗的、仇恨的、宗教的、威胁的、喷子的和侮辱性的。在二元分类任务中，LSTM与BERT嵌入实现了89.42%的准确率，而作为多标签分类器，使用卷积神经网络和双向长短期记忆（CNN-BiLSTM）和注意机制的组合实现了78.92%的准确率，0.86的加权F1分数。为了解释所提出模型的预测和解释分类时的单词特征重要性，我们采用了局部可解释模型无关解释（LIME）框架。我们已将数据集公开，并可在以下网址获取：https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification

0

相关内容

多标记

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ECML-PKDD 2019】可解释序列分类的背景知识注入（Background Knowledge Injection forInterpretable Sequence Classification）

【ECML-PKDD 2019】可解释序列分类的背景知识注入（Background Knowledge Injection forInterpretable Sequence Classification）

专知会员服务

15+阅读 · 2019年12月3日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【IJCAI 2019 Tutorials】概念编码：深度学习的方面情感分类（Concept to Code: Aspect Sentiment Classification with Deep Learning）

【IJCAI 2019 Tutorials】概念编码：深度学习的方面情感分类（Concept to Code: Aspect Sentiment Classification with Deep Learning）

专知会员服务

24+阅读 · 2019年8月11日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

一文读懂深度学习文本分类方法

一文读懂深度学习文本分类方法

AINLP

15+阅读 · 2019年6月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

利用全极化SAR数据反演地表土壤水分方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

USPIO标记LIVIN反义寡脱氧核苷酸靶胰腺癌的磁共振分子成像研究

国家自然科学基金

0+阅读 · 2013年12月31日

MCT-1 作为弥漫性大B细胞淋巴瘤治疗靶点及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

藏北高原冷生土壤的光谱反射特征及有机质反演研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于生物医学文献的隐含知识发现方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多标记医学诊断数据建模方法的研究

国家自然科学基金

1+阅读 · 2012年12月31日

藤黄酸抗B细胞非霍奇金淋巴瘤新机制- - 调控SRC-3/组蛋白乙酰化转录复合物SUMO化修饰

国家自然科学基金

0+阅读 · 2012年12月31日

两种海南榕属植物抗肿瘤活性成分及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

携带凋亡素基因的靶向结肠癌的流感病毒载体研究

国家自然科学基金

0+阅读 · 2010年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Arxiv

0+阅读 · 2023年5月29日

Audio Time-Scale Modification with Temporal Compressing Networks

Arxiv

0+阅读 · 2023年5月28日

Explaining Deep Learning for ECG Analysis: Building Blocks for Auditing and Knowledge Discovery

Arxiv

0+阅读 · 2023年5月26日

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Arxiv

0+阅读 · 2023年5月26日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

多标签分类

双向长短期记忆

相关VIP内容

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ECML-PKDD 2019】可解释序列分类的背景知识注入（Background Knowledge Injection forInterpretable Sequence Classification）

【ECML-PKDD 2019】可解释序列分类的背景知识注入（Background Knowledge Injection forInterpretable Sequence Classification）

专知会员服务

15+阅读 · 2019年12月3日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【IJCAI 2019 Tutorials】概念编码：深度学习的方面情感分类（Concept to Code: Aspect Sentiment Classification with Deep Learning）

【IJCAI 2019 Tutorials】概念编码：深度学习的方面情感分类（Concept to Code: Aspect Sentiment Classification with Deep Learning）

专知会员服务

24+阅读 · 2019年8月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

一文读懂深度学习文本分类方法

一文读懂深度学习文本分类方法

AINLP

15+阅读 · 2019年6月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Arxiv

0+阅读 · 2023年5月29日

Audio Time-Scale Modification with Temporal Compressing Networks

Arxiv

0+阅读 · 2023年5月28日

Explaining Deep Learning for ECG Analysis: Building Blocks for Auditing and Knowledge Discovery

Arxiv

0+阅读 · 2023年5月26日

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Arxiv

0+阅读 · 2023年5月26日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

利用全极化SAR数据反演地表土壤水分方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

USPIO标记LIVIN反义寡脱氧核苷酸靶胰腺癌的磁共振分子成像研究

国家自然科学基金

0+阅读 · 2013年12月31日

MCT-1 作为弥漫性大B细胞淋巴瘤治疗靶点及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

藏北高原冷生土壤的光谱反射特征及有机质反演研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于生物医学文献的隐含知识发现方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多标记医学诊断数据建模方法的研究

国家自然科学基金

1+阅读 · 2012年12月31日

藤黄酸抗B细胞非霍奇金淋巴瘤新机制- - 调控SRC-3/组蛋白乙酰化转录复合物SUMO化修饰

国家自然科学基金

0+阅读 · 2012年12月31日

两种海南榕属植物抗肿瘤活性成分及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

携带凋亡素基因的靶向结肠癌的流感病毒载体研究

国家自然科学基金

0+阅读 · 2010年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员