评估印地文字中发现敌意的深学习模式 (Evaluation of Deep Learning Models for Hostility Detection in Hindi Text)

The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and non-hostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with CNN and LSTM models. Two variations of pre-trained multilingual transformer language models mBERT and IndicBERT are used. We show that the performance of BERT based models is best. Moreover, CNN and LSTM models also perform competitively with BERT based models.

翻译：社交媒体平台是表达个人思想和分享有用信息的方便媒体。它既快速、简洁,又有能力达到数百万人。它是一个将思想归档、分享艺术内容、接受反馈、推广产品等的有效场所。尽管这些平台具有诸多优势,但是这些平台还是刺激了敌对的话题。仇恨言论和贬损性言论被张贴是为了个人满意或政治利益。敌对职位可能会产生欺凌效应,导致整个平台出现敌意。因此, 发现敌对职位对于维护社交媒体卫生很重要。问题在于印地语等资源较少的更显著语言。在这项工作中,我们提出了印地语中敌对文本检测的方法。提议的方法在Cstraint@AAI 2021印地语敌对性检测数据集上得到评估。数据集由从社交媒体平台收集的敌对和非敌对性言论组成。敌对性言论可能会进一步被分割成一系列相互重叠的假、攻击、仇恨和诽谤。我们评估了一组基于CNN、LSTM和BER的深层次语言快速文字字词, 以IMER的模型和MERM的模型用于最佳变型。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日