AtteSTNet -- -- 以关注和子词符号化为基础的代码转换文本仇恨言论检测方法 (AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection) - 专知论文

会员服务 ·

0

一元语法 · Attention · 词元分析器 · 复合数据 · PAR ·

2022 年 8 月 30 日

AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection

翻译：AtteSTNet -- -- 以关注和子词符号化为基础的代码转换文本仇恨言论检测方法

Geet Shingi,Vedangi Wagh

Recent advancements in technology have led to a boost in social media usage which has ultimately led to large amounts of user-generated data which also includes hateful and offensive speech. The language used in social media is often a combination of English and the native language in the region. In India, Hindi is used predominantly and is often code-switched with English, giving rise to the Hinglish (Hindi+English) language. Various approaches have been made in the past to classify the code-mixed Hinglish hate speech using different machine learning and deep learning-based techniques. However, these techniques make use of recurrence on convolution mechanisms which are computationally expensive and have high memory requirements. Past techniques also make use of complex data processing making the existing techniques very complex and non-sustainable to change in data. We propose a much simpler approach which is not only at par with these complex networks but also exceeds performance with the use of subword tokenization algorithms like BPE and Unigram along with multi-head attention-based technique giving an accuracy of 87.41% and F1 score of 0.851 on standard datasets. Efficient use of BPE and Unigram algorithms help handle the non-conventional Hinglish vocabulary making our technique simple, efficient and sustainable to use in the real world.

翻译：近来的技术进步导致社交媒体使用率的提高,最终导致大量用户生成的数据,其中也包括令人憎恶和冒犯的言论。社交媒体使用的语言往往是英语和该地区本地语言的结合。在印度,印地语主要使用,而且往往与英语编码转换,从而产生了Hinglish语(Hindi+English),过去曾采取各种办法,利用不同的机器学习和深层次的学习技术,对编码混合的Hingish仇恨言论进行分类,但这些技术最终导致大量用户生成的数据,其中也包括令人憎恶和冒犯的言论。然而,这些技术利用计算费用昂贵、记忆要求高的变异机制的重复。过去的技术还利用复杂的数据处理,使现有的技术非常复杂,对数据的变化不可持续。我们提出了一个非常简单的方法,不仅与这些复杂的网络相近,而且超过业绩。我们不仅使用BPE和Unigram等子代号代号算法,而且使用多头关注技术,精确地说明了87.41%和0.851分的标准数据集。在标准数据集中高效地使用BPE和Unigram 等系统,从而有效地利用了我们真实的Glaslistal技术。

0

相关内容

一元语法

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Crif1竞争抑制PKI结合PKAαcat促进BM-MSCs放射后成脂分化的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于反馈移位寄存器的流密码相关问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

N-3PUFA降低肥胖型胰岛素抵抗大鼠血脂及提高胰岛素敏感性的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血法对大鼠萎缩性胃炎Hedgehog信号通路的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

外源性HCV RDRP对宿主细胞的表观遗传调控及在肝癌发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于ELAD和RNN的电动车用电动机运行效率快速优化关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于一阶符号轨迹计算理论的模型检测

国家自然科学基金

0+阅读 · 2011年12月31日

支持高速缓存一致的片上网络关键技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

An Efficient Ratio Detector for Ambient Backscatter Communication

An Efficient Ratio Detector for Ambient Backscatter Communication

Arxiv

0+阅读 · 2022年10月18日

Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection

Arxiv

0+阅读 · 2022年10月17日

Model Criticism for Long-Form Text Generation

Arxiv

0+阅读 · 2022年10月16日

Classification of Web Phishing Kits for early detection by platform providers

Arxiv

0+阅读 · 2022年10月15日

Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年10月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

An Efficient Ratio Detector for Ambient Backscatter Communication

An Efficient Ratio Detector for Ambient Backscatter Communication

Arxiv

0+阅读 · 2022年10月18日

Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection

Arxiv

0+阅读 · 2022年10月17日

Model Criticism for Long-Form Text Generation

Arxiv

0+阅读 · 2022年10月16日

Classification of Web Phishing Kits for early detection by platform providers

Arxiv

0+阅读 · 2022年10月15日

Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年10月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

Crif1竞争抑制PKI结合PKAαcat促进BM-MSCs放射后成脂分化的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于反馈移位寄存器的流密码相关问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

N-3PUFA降低肥胖型胰岛素抵抗大鼠血脂及提高胰岛素敏感性的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血法对大鼠萎缩性胃炎Hedgehog信号通路的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

箍筋约束ECC力学性能及应力-应变模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

外源性HCV RDRP对宿主细胞的表观遗传调控及在肝癌发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于ELAD和RNN的电动车用电动机运行效率快速优化关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于一阶符号轨迹计算理论的模型检测

国家自然科学基金

0+阅读 · 2011年12月31日

支持高速缓存一致的片上网络关键技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员