跨文化迁移学习在中文攻击性语言检测中的应用 (Cross-Cultural Transfer Learning for Chinese Offensive Language Detection) - 专知论文

会员服务 ·

0

攻击 · 迁移学习 · 语言模型 · 敏感性 · 可迁移性 ·

2023 年 3 月 31 日

Cross-Cultural Transfer Learning for Chinese Offensive Language Detection

翻译：跨文化迁移学习在中文攻击性语言检测中的应用

Li Zhou,Laura Cabello,Yong Cao,Daniel Hershcovich

from arxiv, C3NLP@EACL

Detecting offensive language is a challenging task. Generalizing across different cultures and languages becomes even more challenging: besides lexical, syntactic and semantic differences, pragmatic aspects such as cultural norms and sensitivities, which are particularly relevant in this context, vary greatly. In this paper, we target Chinese offensive language detection and aim to investigate the impact of transfer learning using offensive language detection data from different cultural backgrounds, specifically Korean and English. We find that culture-specific biases in what is considered offensive negatively impact the transferability of language models (LMs) and that LMs trained on diverse cultural data are sensitive to different features in Chinese offensive language detection. In a few-shot learning scenario, however, our study shows promising prospects for non-English offensive language detection with limited resources. Our findings highlight the importance of cross-cultural transfer learning in improving offensive language detection and promoting inclusive digital spaces.

翻译：攻击性语言检测是一项具有挑战性的任务。不同文化和语言之间的泛化变得更加具有挑战性：除了词汇、句法和语义的差异外，在这个背景下尤其重要的文化规范和敏感性等语用方面也会有很大的差异。在本文中，我们的目标是针对中文攻击性语言检测，并旨在研究使用来自不同文化背景（具体来说是韩国和英国）的攻击性语言检测数据的迁移学习的影响。我们发现，所谓的攻击性在具有不同文化背景的数据中具有文化特定的偏见，这对于语言模型的可迁移性产生了负面影响，并且在对中文攻击性语言检测中，经过多元文化训练的语言模型对于不同的特征非常敏感。但是，在少量数据集的情况下，我们的研究在非英文的攻击性语言检测方面显示出了很好的前景。我们的发现强调了跨文化迁移学习在改进攻击性语言检测和促进包容数字空间方面的重要性。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

哈工大SCIR 14篇长文被ACL 2021主会/Findings和IJCAI 2021录用

专知会员服务

56+阅读 · 2021年5月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

深圳内推 | 香港中文大学（深圳）路广利老师招聘NLP方向研究助理

深圳内推 | 香港中文大学（深圳）路广利老师招聘NLP方向研究助理

PaperWeekly

0+阅读 · 2022年11月8日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

徐阿衡 | 知识抽取-实体及关系抽取(一)

徐阿衡 | 知识抽取-实体及关系抽取(一)

开放知识图谱

41+阅读 · 2018年9月18日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

汉越双语事件语料库构建及舆情观点挖掘方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于篇章语义的文档级统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于阴影恢复技术的SAR三维重建与目标检测方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SphK1对甲状腺癌细胞自噬的重要调控作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于生成模型的迁移学习算法研究及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

藏语依存树库的构建

国家自然科学基金

0+阅读 · 2011年12月31日

《软件学报》学术期刊

国家自然科学基金

6+阅读 · 2011年12月31日

中文自动口语摘要技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

高能面二氧化钛的超强氟效应光催化降解恶臭有毒气体

国家自然科学基金

0+阅读 · 2009年12月31日

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Arxiv

0+阅读 · 2023年5月19日

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search

Arxiv

0+阅读 · 2023年5月19日

Deep Learning for Time Series Anomaly Detection: A Survey

Arxiv

21+阅读 · 2022年11月9日

Game-Theoretic and Machine Learning-based Approaches for Defensive Deception: A Survey

Arxiv

26+阅读 · 2021年1月21日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

哈工大SCIR 14篇长文被ACL 2021主会/Findings和IJCAI 2021录用

专知会员服务

56+阅读 · 2021年5月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

深圳内推 | 香港中文大学（深圳）路广利老师招聘NLP方向研究助理

深圳内推 | 香港中文大学（深圳）路广利老师招聘NLP方向研究助理

PaperWeekly

0+阅读 · 2022年11月8日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

徐阿衡 | 知识抽取-实体及关系抽取(一)

徐阿衡 | 知识抽取-实体及关系抽取(一)

开放知识图谱

41+阅读 · 2018年9月18日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

相关论文

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Arxiv

0+阅读 · 2023年5月19日

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search

Arxiv

0+阅读 · 2023年5月19日

Deep Learning for Time Series Anomaly Detection: A Survey

Arxiv

21+阅读 · 2022年11月9日

Game-Theoretic and Machine Learning-based Approaches for Defensive Deception: A Survey

Arxiv

26+阅读 · 2021年1月21日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

相关基金

汉越双语事件语料库构建及舆情观点挖掘方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于篇章语义的文档级统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于阴影恢复技术的SAR三维重建与目标检测方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SphK1对甲状腺癌细胞自噬的重要调控作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于生成模型的迁移学习算法研究及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

藏语依存树库的构建

国家自然科学基金

0+阅读 · 2011年12月31日

《软件学报》学术期刊

国家自然科学基金

6+阅读 · 2011年12月31日

中文自动口语摘要技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

高能面二氧化钛的超强氟效应光催化降解恶臭有毒气体

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员