CORGI-PM:中国性别偏见调查与减轻风险组织 (CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation) - 专知论文

会员服务 ·

0

有偏 · 语言模型化 · MoDELS · state-of-the-art · 知识 (knowledge) ·

2023 年 1 月 1 日

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

翻译：CORGI-PM:中国性别偏见调查与减轻风险组织

Ge Zhang,Yizhi Li,Yaoyao Wu,Linyuan Zhang,Chenghua Lin,Jiayi Geng,Shi Wang,Jie Fu

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

翻译：由于性别偏见的自然语言处理(NLP)成为一个重要的跨学科主题,大规模语言模型等普遍的数据驱动技术存在数据不足和偏见,特别是对于诸如中文等资源不足的语言而言。为此,我们建议采用中国的COPUS fOR FOR GEBIA Probbing和减轻性别偏见(CORGI-PM),其中包括32.9k句,并配有高质量的标签。此外,我们处理自动文字性别偏见缓解的三项挑战,这需要用模型来检测、分类和减轻文字性别偏见。我们还用最先进的语言模型进行实验,以提供基线。据我们所知,CORGI-PM是中国首个关于性别偏见调查和减轻的句子。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1/AIF信号通路在重离子诱导神经细胞凋亡中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

心肌细胞凋亡小分子探针PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TGF-β#22522;因体内转染联合神经干细胞移植治疗急性脊髓损伤实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

Spectral Convergence of Symmetrized Graph Laplacian on manifolds with boundary

Arxiv

0+阅读 · 2023年3月1日

The propagation game: on simulatability, correlation matrices, and probing security

Arxiv

0+阅读 · 2023年3月1日

Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus

Arxiv

0+阅读 · 2023年2月24日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

VIP会员

文章信息

相关主题

语言模型化

state-of-the-art

知识 (knowledge)

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约认知战概念报告》

《预测促成大规模货运无人机的技术趋势与影响》报告

美海军放弃星座级转而采用国家安全巡逻舰设计

《北约作战弹性概念》报告

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

相关论文

Spectral Convergence of Symmetrized Graph Laplacian on manifolds with boundary

Arxiv

0+阅读 · 2023年3月1日

The propagation game: on simulatability, correlation matrices, and probing security

Arxiv

0+阅读 · 2023年3月1日

Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus

Arxiv

0+阅读 · 2023年2月24日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

相关基金

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1/AIF信号通路在重离子诱导神经细胞凋亡中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

心肌细胞凋亡小分子探针PET显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TGF-β#22522;因体内转染联合神经干细胞移植治疗急性脊髓损伤实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员