Wheacha:解释代码模型预测的方法 (WheaCha: A Method for Explaining the Predictions of Models of Code)

Attribution methods have emerged as a popular approach to interpreting model predictions based on the relevance of input features. Although the feature importance ranking can provide insights of how models arrive at a prediction from a raw input, they do not give a clear-cut definition of the key features models use for the prediction. In this paper, we present a new method, called WheaCha, for explaining the predictions of code models. Although WheaCha employs the same mechanism of tracing model predictions back to the input features, it differs from all existing attribution methods in crucial ways. Specifically, WheaCha divides an input program into "wheat" (i.e., the defining features that are the reason for which models predict the label that they predict) and the rest "chaff" for any prediction of a learned code model. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show (1) HuoYan is efficient - taking on average under twenty seconds to compute the wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to predict input programs is made of simple syntactic or even lexical properties (i.e., identifier names); (3) Based on wheat, we present a novel approach to explaining the predictions of code models through the lens of training data.

翻译：归因方法已经成为一种以输入特性的相关性为基础解释模型预测的流行方法。虽然特性重要性排名可以提供模型如何从原始投入中得出预测的洞察力, 但对于预测所使用的关键特征模型并没有给出清晰的定义。在本文中, 我们提出了一个新方法, 名为 WheachaCHa, 用于解释代码模型的预测。尽管WheaCha 采用了追踪模型预测返回输入特性的相同机制, 但它在关键方面与所有现有的归因方法不同。具体地说, WheaCha 将一个输入程序分为“ 小热”( 即, 定义特性是模型预测标签的原因) 和用于预测任何学习的代码模型的休息“ chaff ” 。我们用工具“ HuoYan” 来认识Weachaha, 并用它来解释四个突出的代码模型: code2vec, 后GNNN, GGNNN, 和 CoBERT 。结果显示 (1) HuoYan is 效率 - 平均在20秒内进行小麦预测, 平均的模型的计算, 包括将小麦预测到最终的预测程序。 (我们从一个小麦到一个小路到一个小路的预测模型, 的计算学到一个最终的模型)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日