一是统治他们所有人:争取联合侦查印度语仇恨言论 (One to rule them all: Towards Joint Indic Language Hate Speech Detection) - 专知论文

会员服务 ·

0

可辨认的 · 语言模型化 · state-of-the-art · MoDELS · 评论员 ·

2021 年 9 月 28 日

One to rule them all: Towards Joint Indic Language Hate Speech Detection

翻译：一是统治他们所有人:争取联合侦查印度语仇恨言论

Mehar Bhatia,Tenzin Singhay Bhotia,Akshat Agarwal,Prakash Ramesh,Shubham Gupta,Kumar Shridhar,Felix Laumann,Ayushman Dash

from arxiv, submitted to FIRE 2021 in the HASOC-FIRE shared task on hate speech and offensive language detection

This paper is a contribution to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) 2021 shared task. Social media today is a hotbed of toxic and hateful conversations, in various languages. Recent news reports have shown that current models struggle to automatically identify hate posted in minority languages. Therefore, efficiently curbing hate speech is a critical challenge and problem of interest. We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection across three languages namely, English, Hindi, and Marathi. On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B. These results show the efficacy of exploiting a multilingual training scheme.

翻译：本文是对2021年印度-欧洲语言中的仇恨言论和攻击性内容识别(HASOC)共同任务的贡献;今天的社交媒体是各种语言中有毒和仇恨性对话的温床;最近的新闻报道表明,当前模式在自动识别少数民族语言中张贴的仇恨情绪方面进行了斗争;因此,有效遏制仇恨言论是一项关键的挑战和关注问题;我们提出了一个多语种结构,利用最先进的变异语言模型,共同学习英语、印地语和马拉地语这三种语言的仇恨和攻击性言论检测;在所提供的测试中,我们在1A子任务和0.6268子任务分类1B中实现了0.799658651的F1分和0.5603分。这些结果表明利用多语种培训计划的效果。

0

相关内容

可辨认的

因果知识图谱自然语言理解

专知会员服务

81+阅读 · 2021年7月3日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

综述 | 事件抽取及推理 (上)

综述 | 事件抽取及推理 (上)

开放知识图谱

87+阅读 · 2019年1月9日

清华大学NLP组整理的机器翻译论文阅读清单

清华大学NLP组整理的机器翻译论文阅读清单

AINLP

5+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

Arxiv

0+阅读 · 2021年11月19日

Toxicity Detection can be Sensitive to the Conversational Context

Arxiv

0+阅读 · 2021年11月19日

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Arxiv

0+阅读 · 2021年11月18日

Investigation of Speaker-adaptation methods in Transformer based ASR

Arxiv

0+阅读 · 2021年11月17日

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

Arxiv

0+阅读 · 2021年11月17日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Towards Open-Domain Named Entity Recognition via Neural Correction Models

Arxiv

5+阅读 · 2019年9月13日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

语言模型化

state-of-the-art

相关VIP内容

因果知识图谱自然语言理解

专知会员服务

81+阅读 · 2021年7月3日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

综述 | 事件抽取及推理 (上)

综述 | 事件抽取及推理 (上)

开放知识图谱

87+阅读 · 2019年1月9日

清华大学NLP组整理的机器翻译论文阅读清单

清华大学NLP组整理的机器翻译论文阅读清单

AINLP

5+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

Arxiv

0+阅读 · 2021年11月19日

Toxicity Detection can be Sensitive to the Conversational Context

Arxiv

0+阅读 · 2021年11月19日

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

Arxiv

0+阅读 · 2021年11月18日

Investigation of Speaker-adaptation methods in Transformer based ASR

Arxiv

0+阅读 · 2021年11月17日

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

Arxiv

0+阅读 · 2021年11月17日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Towards Open-Domain Named Entity Recognition via Neural Correction Models

Arxiv

5+阅读 · 2019年9月13日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员