越南社会媒体公开评论的建设性和有毒言语探测 (Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese)

from arxiv, Accepted as a FULL PAPER for The 34th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2021)

The rise of social media has led to the increasing of comments on online forums. However, there still exists invalid comments which are not informative for users. Moreover, those comments are also quite toxic and harmful to people. In this paper, we create a dataset for constructive and toxic speech detection, named UIT-ViCTSD (Vietnamese Constructive and Toxic Speech Detection dataset) with 10,000 human-annotated comments. For these tasks, we propose a system for constructive and toxic speech detection with the state-of-the-art transfer learning model in Vietnamese NLP as PhoBERT. With this system, we obtain F1-scores of 78.59% and 59.40% for classifying constructive and toxic comments, respectively. Besides, we implement various baseline models as traditional Machine Learning and Deep Neural Network-Based models to evaluate the dataset. With the results, we can solve several tasks on the online discussions and develop the framework for identifying constructiveness and toxicity of Vietnamese social media comments automatically.

翻译：社交媒体的兴起导致在线论坛的评论增加。但是,仍然存在一些对用户来说没有信息内容的无效评论。此外,这些评论对人们也具有相当的毒性和伤害性。在本文中,我们创建了建设性和有毒言语检测数据集,名为UIT-ViCTSD(越南建设性和有毒言语检测数据集),有10,000个附加说明的评论。关于这些任务,我们建议建立一个建设性和有毒言语检测系统,在越南NLP作为PhoBERT中采用最先进的传导学习模式。有了这个系统,我们分别获得了78.59%和59.40%的F1分数,用于对建设性和有毒言语进行分类。此外,我们实施了各种基线模型,作为传统的机器学习和深神经网络模型来评估数据集。通过这些结果,我们可以解决关于在线讨论的几项任务,并自动开发确定越南社会媒体评论的建设性和毒性的框架。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/