Flaw-finding static analysis tools typically generate large volumes of code flaw alerts including many false positives. To save on human effort to triage these alerts, a significant body of work attempts to use machine learning to classify and prioritize alerts. Identifying a useful set of training data, however, remains a fundamental challenge in developing such classifiers in many contexts. We propose using static analysis test suites (i.e., repositories of "benchmark" programs that are purpose-built to test coverage and precision of static analysis tools) as a novel source of training data. In a case study, we generated a large quantity of alerts by executing various static analyzers on the Juliet C/C++ test suite, and we automatically derived ground truth labels for these alerts by referencing the Juliet test suite metadata. Finally, we used this data to train classifiers to predict whether an alert is a false positive. Our classifiers obtained high precision (90.2%) and recall (88.2%) for a large number of code flaw types on a hold-out test set. This preliminary result suggests that pre-training classifiers on test suite data could help to jumpstart static analysis alert classification in data-limited contexts.
翻译:法律调查静态分析工具通常产生大量的代码缺陷警报,包括许多假正数。为了节省人类对这些警报进行分类的努力,大量的工作尝试是利用机器学习来分类和确定警报的优先次序。然而,确定一套有用的培训数据在许多方面仍然是开发这类分类器的基本挑战。我们提议使用静态分析测试套件(即专门为测试静态分析工具的覆盖范围和精确度而建立的“基准标记”程序储存库)作为新颖的培训数据来源。在一项案例研究中,我们通过在朱丽叶C/C+++测试套件上执行各种静态分析器,产生了大量的警报,我们通过引用朱丽叶测试套件元数据自动生成这些警报的地面真相标签。最后,我们利用这些数据来培训分类员,以预测警报是否为假正数。我们的分类器获得了很高的精确度(90.2%),并忆及(88.2%),用于在暂停测试套件上的大量代码瑕疵类型。这一初步结果表明,测试套件中的培训前分类器的分类员可以帮助在数据范围内启动静态分析警报。