AI 信号-意识示范增强和反演 (Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection)

AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model learning introspection approach from the perspective of the dataset.

翻译：对源代码进行建模的脱衣示范工作取得了显著进展,并正在生产开发管道中采用。然而,人们正在提出可靠性问题,特别是这些模型是否实际上学习源代码与任务有关的方面。虽然最近的示范检验方法发现在许多AI-代号模型中缺乏信号意识,即没有捕捉与任务相关的信号的模型,但它们并没有提供纠正这一问题的解决方案。在本文件中,我们探索了数据驱动方法,以提高模型的信号意识:1)我们把代码复杂性的SE概念与AI课程学习技术结合起来;2)我们通过定制德尔塔调试来将SE援助纳入AI模型,以生成简化的信号保存程序,并将它们添加到培训数据集中。我们用技术,在模型信号意识方面实现了4.8x的改进。我们利用代码复杂性的概念,从数据集的角度进一步介绍了新的模型学习引言方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/