寻找代代码生成中的两面的简单、但有效的办法 (A Simple, Yet Effective Approach to Finding Biases in Code Generation) - 专知论文

会员服务 ·

0

有偏 · 代码 · 语言模型化 · MoDELS · SimPLe ·

2022 年 10 月 31 日

A Simple, Yet Effective Approach to Finding Biases in Code Generation

翻译：寻找代代码生成中的两面的简单、但有效的办法

Spyridon Mouselinos,Mateusz Malinowski,Henryk Michalewski

from arxiv, Preprint

Recently, scores of high-performing code generation systems have surfaced. As has become a popular choice in many domains, code generation is often approached using large language models as a core, trained under the masked or causal language modeling schema. This work shows that current code generation systems exhibit biases inherited from large language model backbones, which might leak into generated code under specific circumstances. To investigate the effect, we propose a framework that automatically removes hints and exposes various biases that these code generation models use. We apply our framework to three coding challenges and test it across top-performing coding generation models. Our experiments reveal biases towards specific prompt structure and exploitation of keywords during code generation. Finally, we demonstrate how to use our framework as a data transformation technique, which we find a promising direction toward more robust code generation.

翻译：最近,出现了数十个高性能代码生成系统。正如在许多领域已经成为一个受欢迎的选择,人们往往使用大型语言模型作为核心,在蒙面或因果语言模型模式下加以培训。这项工作表明,目前的代码生成系统从大型语言模型主干中产生了偏差,在特定情况下可能会渗入生成的代码中。为了调查效果,我们提出了一个框架,自动去除提示,暴露这些代码生成模型所使用的各种偏差。我们应用了我们的框架来应对三种代码化挑战,并在最高性能编码生成模型中测试它。我们的实验揭示了在代码生成过程中对特定快速结构和关键词的利用的偏差。最后,我们展示了如何利用我们的框架作为数据转换技术,我们找到了更强有力的代码生成的有希望的方向。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

空蚀对镍基Inconel600合金钝化膜电化学性能影响

国家自然科学基金

0+阅读 · 2013年12月31日

疏水性离子液体电沉积镍铁合金微观结构调控及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白磷酸化调节钾离子通道门控过程微观机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

细胞外Hsp90alpha促进肿瘤侵袭的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体为溶剂热解制备金属颗粒机理及其电化学行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体对褐煤中弱键合结构的解离及其控制机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Arxiv

0+阅读 · 2022年12月21日

Tracing and Removing Data Errors in Natural Language Generation Datasets

Arxiv

0+阅读 · 2022年12月21日

Self-Instruct: Aligning Language Model with Self Generated Instructions

Self-Instruct: Aligning Language Model with Self Generated Instructions

Arxiv

1+阅读 · 2022年12月20日

SimpleStyle: An Adaptable Style Transfer Approach

SimpleStyle: An Adaptable Style Transfer Approach

Arxiv

0+阅读 · 2022年12月20日

Diverse Demonstrations Improve In-context Compositional Generalization

Arxiv

0+阅读 · 2022年12月20日

An Information-Theoretic Approach to Transferability in Task Transfer Learning

Arxiv

0+阅读 · 2022年12月20日

A Natural Bias for Language Generation Models

Arxiv

0+阅读 · 2022年12月19日

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

Arxiv

0+阅读 · 2022年12月15日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《商用大语言模型的升级风险管理：国家安全运用》

【伯克利博士论文】通过真实世界实践赋能机器人自主性

《从装备到文化：美陆军技术素养建设启示录》最新报告

人工智能安全治理白皮书（2025）

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Arxiv

0+阅读 · 2022年12月21日

Tracing and Removing Data Errors in Natural Language Generation Datasets

Arxiv

0+阅读 · 2022年12月21日

Self-Instruct: Aligning Language Model with Self Generated Instructions

Self-Instruct: Aligning Language Model with Self Generated Instructions

Arxiv

1+阅读 · 2022年12月20日

SimpleStyle: An Adaptable Style Transfer Approach

SimpleStyle: An Adaptable Style Transfer Approach

Arxiv

0+阅读 · 2022年12月20日

Diverse Demonstrations Improve In-context Compositional Generalization

Arxiv

0+阅读 · 2022年12月20日

An Information-Theoretic Approach to Transferability in Task Transfer Learning

Arxiv

0+阅读 · 2022年12月20日

A Natural Bias for Language Generation Models

Arxiv

0+阅读 · 2022年12月19日

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

Arxiv

0+阅读 · 2022年12月15日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

空蚀对镍基Inconel600合金钝化膜电化学性能影响

国家自然科学基金

0+阅读 · 2013年12月31日

疏水性离子液体电沉积镍铁合金微观结构调控及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

蛋白磷酸化调节钾离子通道门控过程微观机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

细胞外Hsp90alpha促进肿瘤侵袭的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体为溶剂热解制备金属颗粒机理及其电化学行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体对褐煤中弱键合结构的解离及其控制机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员