The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and non-hostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN and LSTM for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with these models to evaluate their effectiveness. We show that the multi-CNN model when combined with IndicNLP FastText word embedding gives the best results.
翻译:社交媒体平台是表达个人思想和分享有用信息的方便媒体。 它既快速又简洁,又有能力达到数百万人。 它是一个将思想归档、分享艺术内容、接受反馈、推广产品等的有效场所。 尽管这些平台有诸多优势,但这些平台还是刺激了敌对立场。 仇恨言论和贬损性言论被张贴是为了个人满意或政治利益。 敌对言论可能会产生欺凌效应,使整个平台充满敌意。 因此, 发现敌对职位对于维护社交媒体卫生很重要。 问题在于印地语等资源较少的更显著语言。 在此工作中, 我们展示了印地语中的敌对文本检测方法。 所提议的方法在 Constraint@AAI 2021印地语敌意检测数据集上进行了评估。 数据集由从社交媒体平台收集的敌对和非敌对言论组成。 敌对言论会进一步被分割成重复的类类, 导致整个平台充满敌意。 我们用CNNSTM和LTM来评估这个多标签分类问题的大量深层次的学习方法。 我们事先训练过的印地文快速文本嵌入了印地文, IndicentNP和Facebook在快速的模型中展示了它们的最佳结果。