人造情报在发现社会媒体中有害言论的恶性言论资料方面的作用 (Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media)

Social networking platforms provide a conduit to disseminate our ideas, views and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning based approaches for Hindi-English code-mixed language are employed by utilizing contextual based embeddings such as ELMo (Embeddings for Language Models), FLAIR, and transformer-based BERT (Bidirectional Encoder Representations from Transformers). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.

翻译：社交网络平台为传播我们的思想、观点和想法以及信息提供了渠道,这导致英语与母语合并。印地语-英语代码混合数据(Hinglish)的普及率随着全世界大多数城市人口的增加而不断上升。大多数社交网络平台部署的仇恨言论检测算法无法过滤这些代码混合语言中张贴的冒犯和滥用内容。因此,考虑到印度语和语类的语种,全世界约44%的仇恨言论检测率更低。我们在此文件中提出了高效检测非结构化代码混合语言的方法。对印地语-英语代码混合语言的优化方法,通过使用基于背景的嵌入法,如ELMO(语言模型的床位)、FLAIR(FLAIR)和基于变压器的BERT(变压器的BERT)等。我们提出的方法与先前存在的方法和结果进行了比较,并比较了各种数据集。我们的模型超越了其他方法和框架。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日