《2020年印度-欧洲语言中的仇恨言论和攻击性内容识别》 (Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages) - 专知论文

会员服务 ·

0

Performer · binary · 二分类 · 矩 · Machine Learning ·

2021 年 8 月 12 日

Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages

翻译：《2020年印度-欧洲语言中的仇恨言论和攻击性内容识别》

Thomas Mandla,Sandip Modha,Gautam Kishore Shahi,Amit Kumar Jaiswal,Durgesh Nandini,Daksh Patel,Prasenjit Majumder,Johannes Schäfer

from arxiv, 25 pages

With the growth of social media, the spread of hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech is spreading in these countries. This brings a need for multilingual Hate Speech detection algorithms. Much research in this area is dedicated to English at the moment. The HASOC track intends to provide a platform to develop and optimize Hate Speech detection algorithms for Hindi, German and English. The dataset is collected from a Twitter archive and pre-classified by a machine learning system. HASOC has two sub-task for all three languages: task A is a binary classification problem (Hate and Not Offensive) while task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY. Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are F1 measures of 0.51, 0.53 and 0.52 for English, Hindi, and German, respectively. For task B, the best classification algorithms achieved F1 measures of 0.26, 0.33 and 0.29 for English, Hindi, and German, respectively. This article presents the tasks and the data development as well as the results. The best performing algorithms were mainly variants of the transformer architecture BERT. However, also other systems were applied with good success

翻译：随着社交媒体的增长,仇恨言论的传播也在迅速增加。在许多国家,社交媒体被广泛使用。仇恨言论也在这些国家蔓延。这就需要多语种仇恨言论检测算法。这一领域的大量研究目前专门针对英语。HASOC轨道打算提供一个平台,以开发和优化印地语、德语和英语的仇恨言论检测算法。数据集来自Twitter档案,由机器学习系统预先分类。HASOC对所有三种语言都有两种子任务:任务A是一个二进制分类问题(Hate和不进攻性),任务B是三种(HATE)仇恨言论(Ompensive 和 Profanity)的精细分类问题。总体而言,40个团队提交了252批。任务A的最佳分类算法表现为:英语、印地语和德语的0.51、0.53和0.52个最佳分类算法分别为0.51、0.53和0.52。关于任务B,最佳分类算法为0.26、0.33和0.29,而任务B是英语、印地语和德语的精细分类方法。这一文章主要展示了B结构的改进结果。

0

相关内容

Performer

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

专知会员服务

17+阅读 · 2019年12月2日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

已删除

将门创投

6+阅读 · 2019年6月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Model Bias in NLP -- Application to Hate Speech Classification using transfer learning techniques

Model Bias in NLP -- Application to Hate Speech Classification using transfer learning techniques

Arxiv

0+阅读 · 2021年10月11日

TEET! Tunisian Dataset for Toxic Speech Detection

Arxiv

1+阅读 · 2021年10月11日

Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection

Arxiv

0+阅读 · 2021年10月10日

Applying Phonological Features in Multilingual Text-To-Speech

Applying Phonological Features in Multilingual Text-To-Speech

Arxiv

0+阅读 · 2021年10月10日

Process Extraction from Text: state of the art and challenges for the future

Arxiv

0+阅读 · 2021年10月7日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Advances in Natural Language Question Answering: A Review

Advances in Natural Language Question Answering: A Review

Arxiv

5+阅读 · 2019年4月10日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

The Users' Perspective on the Privacy-Utility Trade-offs in Health Recommender Systems

Arxiv

5+阅读 · 2018年4月13日

Understanding Chatbot-mediated Task Management

Arxiv

10+阅读 · 2018年2月9日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

【斯坦福课程：从语言到信息】《CS 124: From Languages to Information (Winter 2020)》by Dan Jurafsky

专知会员服务

17+阅读 · 2019年12月2日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

美联参会指南-联合规划与执行概述及政策框架 | 32页

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

相关资讯

CCF推荐 | 国际会议信息6条

CCF推荐 | 国际会议信息6条

Call4Papers

9+阅读 · 2019年8月13日

已删除

将门创投

6+阅读 · 2019年6月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Model Bias in NLP -- Application to Hate Speech Classification using transfer learning techniques

Model Bias in NLP -- Application to Hate Speech Classification using transfer learning techniques

Arxiv

0+阅读 · 2021年10月11日

TEET! Tunisian Dataset for Toxic Speech Detection

Arxiv

1+阅读 · 2021年10月11日

Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection

Arxiv

0+阅读 · 2021年10月10日

Applying Phonological Features in Multilingual Text-To-Speech

Applying Phonological Features in Multilingual Text-To-Speech

Arxiv

0+阅读 · 2021年10月10日

Process Extraction from Text: state of the art and challenges for the future

Arxiv

0+阅读 · 2021年10月7日

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review

Arxiv

17+阅读 · 2019年10月9日

Advances in Natural Language Question Answering: A Review

Advances in Natural Language Question Answering: A Review

Arxiv

5+阅读 · 2019年4月10日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

The Users' Perspective on the Privacy-Utility Trade-offs in Health Recommender Systems

Arxiv

5+阅读 · 2018年4月13日

Understanding Chatbot-mediated Task Management

Arxiv

10+阅读 · 2018年2月9日

微信扫码咨询专知VIP会员