在NLP研究中处理和提交有害文本 (Handling and Presenting Harmful Text in NLP Research) - 专知论文

会员服务 ·

0

讲稿 · NLP · 词性标注 · 设计 · 自然语言处理 ·

2022 年 10 月 25 日

Handling and Presenting Harmful Text in NLP Research

翻译：在NLP研究中处理和提交有害文本

Hannah Rose Kirk,Abeba Birhane,Bertie Vidgen,Leon Derczynski

from arxiv, in Findings of EMNLP 2022

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is \textit{sought} as a feature of the research design if explicitly studying harmful content (e.g., training a hate speech classifier), versus \textit{unsought} if harmful content is encountered when working on unrelated problems (e.g., language generation or part-of-speech tagging); and (3) who it affects, from people (mis)represented in the data to those handling the data and those publishing on the data. We provide advice for practitioners, with concrete steps for mitigating harm in research and in publication. To assist implementation we introduce \textsc{HarmCheck} -- a documentation standard for handling and presenting harmful text in research.

翻译：文本数据可能构成伤害的风险。然而,风险没有得到完全理解,如何以安全的方式处理、提出和讨论有害文本仍然是国家语言方案社区尚未解决的问题。我们提供了一个分析框架,将伤害分为三个轴:(1) 伤害类型(如错误信息、仇恨言论或种族陈规定型观念);(2) 如果明确研究有害内容(如培训仇恨言论分类员),那么作为研究设计的一个特征的伤害是否为\textit{sawet},如果在研究无关的问题(如语言生成或部分发言标记)时遇到有害内容,那么这种危险将是一个未决问题。我们提供了一个分析框架,分析框架将伤害分为以下三个轴:(1) 伤害类型(如错误)、伤害类型(如错误)、伤害类型(如错误)、伤害是否构成研究设计的一个特征。我们向从业者提供咨询,包括减少研究和出版中的伤害的具体步骤。为了协助执行,我们引入了\ textsc{Harmcuck} -- 处理和在研究中介绍有害文本的文件标准。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

农村居民食品安全消费行为形成机理及引导机制研究----以江西为例

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

畜粪蚯蚓反应器中腐殖质形成及重金属迁移转化的关联特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

状态受限最优控制问题的有限元方法

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

准晶材料非线性断裂理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cross-Modal Similarity-Based Curriculum Learning for Image Captioning

Arxiv

0+阅读 · 2022年12月14日

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2022年12月13日

Prompting Is Programming: A Query Language For Large Language Models

Arxiv

0+阅读 · 2022年12月12日

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Arxiv

0+阅读 · 2022年12月12日

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Arxiv

0+阅读 · 2022年12月8日

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Arxiv

48+阅读 · 2022年9月7日

Survey on Evolutionary Deep Learning: Principles, Algorithms, Applications and Open Issues

Arxiv

20+阅读 · 2022年8月23日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

VIP会员

文章信息

相关主题

自然语言处理

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

基于大型语言模型的网络威胁情报：利用LLM提取MITRE ATT&CK技术 | 最新文献

无人机（UAV）战略：区域大国与暴力非国家行为体在中东冲突中对无人机的运用 | 130页

神经技术与未来无人机战争的交汇点 | 最新报告

美国从“蛛网行动”中汲取轰炸机舰队保护教训

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Cross-Modal Similarity-Based Curriculum Learning for Image Captioning

Arxiv

0+阅读 · 2022年12月14日

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2022年12月13日

Prompting Is Programming: A Query Language For Large Language Models

Arxiv

0+阅读 · 2022年12月12日

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Arxiv

0+阅读 · 2022年12月12日

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Arxiv

0+阅读 · 2022年12月8日

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Arxiv

48+阅读 · 2022年9月7日

Survey on Evolutionary Deep Learning: Principles, Algorithms, Applications and Open Issues

Arxiv

20+阅读 · 2022年8月23日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

农村居民食品安全消费行为形成机理及引导机制研究----以江西为例

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

畜粪蚯蚓反应器中腐殖质形成及重金属迁移转化的关联特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

状态受限最优控制问题的有限元方法

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

准晶材料非线性断裂理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员