使用变换法方法在代码混合文本中使用的背景仇恨言论探测 (Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches) - 专知论文

会员服务 ·

0

相互独立的 · Performance · Automator · BERT · Performer ·

2021 年 10 月 18 日

Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

翻译：使用变换法方法在代码混合文本中使用的背景仇恨言论探测

Ravindra Nayak,Raviraj Joshi

from arxiv, Accepted at HASOC @Forum for Information Retrieval Evaluation(FIRE) 2021

In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensemble methods to further improve the performance. Using these methods we were able to achieve the best F1 score of 73.07% on the HASOC 2021 ICHCL code mixed data set.

翻译：最近,社交媒体平台帮助人们联系和沟通到更广泛的受众。但这也导致网络欺凌急剧增加。检测和遏制仇恨言论至关重要,以保持社交媒体平台的灵敏性。此外,这些平台经常使用包含多种语言的代码混合文本。因此,我们提议在报废推特的代码混合文本中采用自动技术来检测仇恨言论。我们特别侧重于代码混合英文-印度文文本和基于变压器的方法。在对文本进行独立分析的常规方法中,我们也使用母体推文形式的内容文本。我们试图评估多语种BERT和英德-BERT在单一编码器和双编码环境中的性能。我们的第一个方法是使用分隔符符号对目标文本和背景文本进行配对,从废弃的Twitter模式中获取单一代表。我们用双倍的BERT编码和相应的表达方式对两种文本进行编码。我们用独立表达的双倍分解法方法进一步使用母体推文。我们试图评估多语种BERT和英德-BERT的性能提高性能。我们还利用了FSL1的混合方法改进了我们20的性能方法。我们用了20 %的计算方法。我们还用了这些方法改进了20分解方法。我们实现了。

0

相关内容

相互独立的

相互独立的

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2018年11月20日

Combining Textual Features for the Detection of Hateful and Offensive Language

Arxiv

0+阅读 · 2021年12月9日

Training end-to-end speech-to-text models on mobile phones

Arxiv

0+阅读 · 2021年12月7日

Hope Speech detection in under-resourced Kannada language

Arxiv

0+阅读 · 2021年12月5日

"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic

Arxiv

0+阅读 · 2021年12月4日

Multimodal Emotion Recognition with High-level Speech and Text Features

Arxiv

0+阅读 · 2021年9月29日

Influence of ASR and Language Model on Alzheimer's Disease Detection

Arxiv

0+阅读 · 2021年9月20日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

已删除

将门创投

4+阅读 · 2018年11月20日

相关论文

Combining Textual Features for the Detection of Hateful and Offensive Language

Arxiv

0+阅读 · 2021年12月9日

Training end-to-end speech-to-text models on mobile phones

Arxiv

0+阅读 · 2021年12月7日

Hope Speech detection in under-resourced Kannada language

Arxiv

0+阅读 · 2021年12月5日

"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic

Arxiv

0+阅读 · 2021年12月4日

Multimodal Emotion Recognition with High-level Speech and Text Features

Arxiv

0+阅读 · 2021年9月29日

Influence of ASR and Language Model on Alzheimer's Disease Detection

Arxiv

0+阅读 · 2021年9月20日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

微信扫码咨询专知VIP会员