This work explores the signal awareness of AI models for source code understanding. Using a software vulnerability detection use-case, we evaluate the models' ability to capture the correct vulnerability signals to produce their predictions. Our prediction-preserving input minimization (P2IM) approach systematically reduces the original source code to a minimal snippet which a model needs to maintain its prediction. The model's reliance on incorrect signals is then uncovered when a vulnerability in the original code is missing in the minimal snippet, both of which the model however predicts as being vulnerable. We apply P2IM on three state-of-the-art neural network models across multiple datasets, and measure their signal awareness using a new metric we propose- Signal-aware Recall (SAR). The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric, highlighting that the models are presumably picking up a lot of noise or dataset nuances while learning their vulnerability detection logic.
翻译:这项工作探索了 AI 模型的信号意识, 以便了解源代码理解 。 我们使用软件脆弱性检测使用案例, 评估模型捕捉正确脆弱性信号的能力, 以便做出预测 。 我们的预测- 保存输入最小化( P2IM) 方法系统地将原始源代码降低到一个最小的片段, 模型需要保持其预测。 当原始代码中的脆弱性在最小的片段缺失时, 模型对错误信号的依赖就会暴露出来, 两种模式都预测是脆弱的 。 我们用三种最先进的神经网络模型在多个数据集中应用 P2IM, 并用我们提议的新的指标- 信号- 警告回调( SAR) 来衡量它们的信号意识。 结果表明模型的回想从90年代高到60年代的次点点急剧下降, 并强调指出, 模型在学习其脆弱性检测逻辑的同时, 很可能会收集到大量的噪音或数据设定的细微值 。