以资源不足的Kannada语言检测希望言语 (Hope Speech detection in under-resourced Kannada language)

Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms. However, there are relatively lesser amounts of study that converges on embracing positivity, reinforcing supportive and reassuring content in online forums. Consequently, we propose creating an English-Kannada Hope speech dataset, KanHope and comparing several experiments to benchmark the dataset. The dataset consists of 6,176 user-generated comments in code mixed Kannada scraped from YouTube and manually annotated as bearing hope speech or Not-hope speech. In addition, we introduce DC-BERT4HOPE, a dual-channel model that uses the English translation of KanHope for additional training to promote hope speech detection. The approach achieves a weighted F1-score of 0.756, bettering other models. Henceforth, KanHope aims to instigate research in Kannada while broadly promoting researchers to take a pragmatic approach towards online content that encourages, positive, and supportive.

翻译：通过消除社会媒体平台的粗俗、冒犯和激烈评论,开发了许多方法来监测消极主义在现代年的传播,消除了社会媒体平台的粗俗、冒犯和激烈评论,然而,在接受现实主义、加强在线论坛支持和令人放心的内容方面,研究数量相对较少,因此,我们提议建立一个英语-Kannada希望语言数据集,KanHope并比较数个实验以作为数据集的基准,数据集包括由用户生成的6 176个评论,这些评论来自从YouTube中分离出来的康纳达混合代码,以及手动附加注释的带有希望演讲或非希望演讲的内容。此外,我们引入了DC-BERT4HOPE,这是一个双通道模型,使用KanHope的英语译文进行额外培训,以促进对希望言论的探测。该方法实现了0.756的加权F1分数,从而改进了其他模型。因此,KanHope的目的是在Kannada启动研究,同时广泛推动研究人员对在线内容采取鼓励、积极和支持的务实做法。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【Facebook AI】低资源机器翻译，74页ppt

专知会员服务

30+阅读 · 2020年4月8日