多语种内容调控:关于改编的案例研究 (Multilingual Content Moderation: A Case Study on Reddit) - 专知论文

会员服务 ·

0

Analysis · CASE · 可辨认的 · Extensibility · Performer ·

2023 年 2 月 19 日

Multilingual Content Moderation: A Case Study on Reddit

翻译：多语种内容调控:关于改编的案例研究

Meng Ye,Karan Sikka,Katherine Atwell,Sabit Hassan,Ajay Divakaran,Malihe Alikhani

Content moderation is the process of flagging content based on pre-defined platform rules. There has been a growing need for AI moderators to safeguard users as well as protect the mental health of human moderators from traumatic content. While prior works have focused on identifying hateful/offensive language, they are not adequate for meeting the challenges of content moderation since 1) moderation decisions are based on violation of rules, which subsumes detection of offensive speech, and 2) such rules often differ across communities which entails an adaptive solution. We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 Million Reddit comments spanning 56 subreddits in English, German, Spanish and French. We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems such as cross-lingual transfer, learning under label noise (human biases), transfer of moderation models, and predicting the violated rule. Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.

翻译：内容温和是一种基于预先界定的平台规则的标志性内容过程; AI主持人越来越需要保护用户和保护人类主持人的心理健康不受创伤性内容的影响; 尽管先前的工作侧重于识别仇恨/冒犯性语言,但不足以应对内容温和性挑战,因为:(1) 温和性决定基于违反规则,包含对攻击性言论的检测,和(2) 此类规则在各社区之间往往有所不同,这需要适应性的解决办法; 我们提议研究内容温和性的挑战,采用一套包含1.8百万次Reddit评论的多语种数据集,以英文、德文、西班牙文和法文为56次次修改提供。我们进行了广泛的实验分析,以突出潜在的挑战,并提出相关的研究问题,如跨语言转移、在标签噪音(人类偏见)、调和模式转移以及预测被违反的规则。我们的数据组合和分析可以帮助更好地准备自动调和的挑战和机遇。

0

相关内容

Analysis

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

超导量子电路中单量子态的相干操控和测量

国家自然科学基金

0+阅读 · 2014年12月31日

基于光腔衰荡光谱技术的呼吸生物标记物研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于三维结构重建的固体氧化物燃料电池纳米电极的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

超冷原子气体在无序晶格中相变和相干动力学特性

国家自然科学基金

0+阅读 · 2012年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

CdTe/PbTe异质结二维电子气的电学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于室温固体氧化物燃料电池的超晶格电解质界面效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

孑遗植物桫椤的适应性种群分化研究

国家自然科学基金

0+阅读 · 2009年12月31日

p-n复合半导体CoO/CdS敏化TiO2可见光催化分解水制氢

国家自然科学基金

0+阅读 · 2008年12月31日

Programming Language Assisted Waveform Analysis: A Case Study on the Instruction Performance of SERV

Arxiv

0+阅读 · 2023年4月12日

An Image Quality Assessment Dataset for Portraits

Arxiv

0+阅读 · 2023年4月12日

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Arxiv

0+阅读 · 2023年4月10日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

2+阅读 · 2023年4月8日

Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing

Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing

Arxiv

3+阅读 · 2023年4月7日

AI Model Disgorgement: Methods and Choices

Arxiv

0+阅读 · 2023年4月7日

Leveraging GANs for data scarcity of COVID-19: Beyond the hype

Arxiv

0+阅读 · 2023年4月7日

Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

Arxiv

0+阅读 · 2023年4月7日

On the Pareto Front of Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年4月7日

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

Arxiv

0+阅读 · 2023年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Programming Language Assisted Waveform Analysis: A Case Study on the Instruction Performance of SERV

Arxiv

0+阅读 · 2023年4月12日

An Image Quality Assessment Dataset for Portraits

Arxiv

0+阅读 · 2023年4月12日

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Arxiv

0+阅读 · 2023年4月10日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

2+阅读 · 2023年4月8日

Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing

Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing

Arxiv

3+阅读 · 2023年4月7日

AI Model Disgorgement: Methods and Choices

Arxiv

0+阅读 · 2023年4月7日

Leveraging GANs for data scarcity of COVID-19: Beyond the hype

Arxiv

0+阅读 · 2023年4月7日

Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

Arxiv

0+阅读 · 2023年4月7日

On the Pareto Front of Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年4月7日

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

Arxiv

0+阅读 · 2023年4月6日

相关基金

超导量子电路中单量子态的相干操控和测量

国家自然科学基金

0+阅读 · 2014年12月31日

基于光腔衰荡光谱技术的呼吸生物标记物研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于三维结构重建的固体氧化物燃料电池纳米电极的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

超冷原子气体在无序晶格中相变和相干动力学特性

国家自然科学基金

0+阅读 · 2012年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

CdTe/PbTe异质结二维电子气的电学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于室温固体氧化物燃料电池的超晶格电解质界面效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

孑遗植物桫椤的适应性种群分化研究

国家自然科学基金

0+阅读 · 2009年12月31日

p-n复合半导体CoO/CdS敏化TiO2可见光催化分解水制氢

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员