Previous studies have demonstrated that neural code comprehension models are vulnerable to identifier naming. By renaming as few as one identifier in the source code, the models would output completely irrelevant results, indicating that identifiers can be misleading for model prediction. However, identifiers are not completely detrimental to code comprehension, since the semantics of identifier names can be related to the program semantics. Well exploiting the two opposite impacts of identifiers is essential for enhancing the robustness and accuracy of neural code comprehension, and still remains under-explored. In this work, we propose to model the impact of identifiers from a novel causal perspective, and propose a counterfactual reasoning-based framework named CREAM. CREAM explicitly captures the misleading information of identifiers through multi-task learning in the training stage, and reduces the misleading impact by counterfactual inference in the inference stage. We evaluate CREAM on three popular neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CREAM not only significantly outperforms baselines in terms of robustness (e.g., +37.9% on the function naming task at F1 score), but also achieve improved results on the original datasets (e.g., +0.5% on the function naming task at F1 score).
翻译:先前的研究显示,神经代码理解模型容易被识别识别名的命名。 通过在源代码中重新命名为少数识别名,这些模型将产生完全不相干的结果,表明识别名可能误导模型的预测。 然而,识别名不会完全损害代码理解,因为识别名的语义可能与程序语义学相关。充分利用识别名的两个相反影响对于加强神经代码理解的稳健性和准确性至关重要,并且仍然在探索中。在这项工作中,我们提议从新的因果关系角度来模拟识别名的影响,并提议一个反事实推理框架,名为CREAM。 CREAM通过培训阶段的多任务学习明确捕捉到识别名的误导信息,并通过推断阶段的反事实推理来减少误导影响。我们评估了三个流行的神经代码理解任务,包括函数命名、缺陷检测和代码分类。实验结果表明,CREAM不仅从新的因果关系的角度显著地超越了基准(e.g.+37. am), 并提出了一个反事实推理依据框架框架,称为CREAM。 CREAM明确通过在培训阶段的多任务评分数(F. 1+0.1) 也实现了初始定分数任务分数(F.%1) 任务分数的排序。