The recent breakthroughs in deep learning methods have sparked a wave of interest in learning-based bug detectors. Compared to the traditional static analysis tools, these bug detectors are directly learned from data, thus, easier to create. On the other hand, they are difficult to train, requiring a large amount of data which is not readily available. In this paper, we propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors: bug-type generic (i.e., capable of catching the types of bugs that are totally unobserved during training), self-explainable (i.e., capable of explaining its own prediction without any external interpretability methods) and sample efficient (i.e., requiring substantially less training data than standard bug detectors). Our extensive evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs; in the process MBD also significantly outperforms several noteworthy baselines including Facebook Infer, a prominent static analysis tool, and FICS, the latest anomaly detection method.
翻译:最近深层学习方法的突破引发了对学习型错误探测器的兴趣浪潮。 与传统的静态分析工具相比, 这些错误探测器直接从数据中直接学习, 因而更容易创建。 另一方面, 它们难以培训, 需要大量不易获得的数据。 在本文中, 我们提出一种新的方法, 称为元错误探测, 与现有的学习型错误探测器相比具有三大优势: 错误类型普通( 即, 能够捕捉在培训期间完全看不到的错误类型 ), 自我探索( 即, 能够解释自己的预测而没有任何外部解释方法 ), 以及样本效率( 即, 要求的培训数据比标准的错误探测器少得多 ) 。 我们的广泛评估显示我们的元错误探测器( MBD) 有效地捕捉了各种错误, 包括无效指针断、 阵列指数出界外、 文件处理泄漏, 甚至同时程序的数据竞赛; 在这一过程中, MBD 也明显超出几个值得注意的基准, 包括Facebook Infer、 突出的静态分析工具、 和 FICS 最新检测异常方法 。