评估印地文字中发现敌意的深学习模式 (Evaluation of Deep Learning Models for Hostility Detection in Hindi Text)

The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and non-hostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN and LSTM for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with these models to evaluate their effectiveness. We show that the multi-CNN model when combined with IndicNLP FastText word embedding gives the best results.

翻译：社交媒体平台是表达个人思想和分享有用信息的方便媒体。它既快速又简洁,又有能力达到数百万人。它是一个将思想归档、分享艺术内容、接受反馈、推广产品等的有效场所。尽管这些平台有诸多优势,但这些平台还是刺激了敌对立场。仇恨言论和贬损性言论被张贴是为了个人满意或政治利益。敌对言论可能会产生欺凌效应,使整个平台充满敌意。因此, 发现敌对职位对于维护社交媒体卫生很重要。问题在于印地语等资源较少的更显著语言。在此工作中, 我们展示了印地语中的敌对文本检测方法。所提议的方法在 Constraint@AAI 2021印地语敌意检测数据集上进行了评估。数据集由从社交媒体平台收集的敌对和非敌对言论组成。敌对言论会进一步被分割成重复的类类, 导致整个平台充满敌意。我们用CNNSTM和LTM来评估这个多标签分类问题的大量深层次的学习方法。我们事先训练过的印地文快速文本嵌入了印地文, IndicentNP和Facebook在快速的模型中展示了它们的最佳结果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日