Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning tasks that reduce the semantic gap. We benchmarked our approach on two listings of coding policies (CWE and CBP). This is a zero-shot evaluation as none of the policies occur in the training set. On CWE and CBP respectively, our tool Policy2Code achieves classification accuracies of (59%, 71%) and search MRR of (0.05, 0.21) compared to CodeBERT with classification accuracies of (37%, 54%) and MRR of (0.02, 0.02). In a user study, 24% Policy2Code detections were accepted compared to 7% for CodeBERT.
翻译:手册代码审查和静态代码分析器是核查源代码是否符合编码政策的传统机制。然而,这些机制很难推广。我们把代码合规性评估作为机器学习(ML)问题来进行,以输入自然语言政策和代码,并对守则的合规性、不合规性或无关性作出预测。这将有助于扩大合规性分类,并搜索传统机制未涵盖的政策。我们探索关于ML模式制定、培训数据和评价设置的关键研究问题。核心想法是获得一个联合代码文本嵌入空间,通过代码和政策嵌入的矢量距离来维护合规关系。由于没有具体任务数据,我们重新解释和过滤常用的软件数据集,并增加培训前和调整前任务,以减少语系差距。我们用两种编码列表(CWE和CBBB)作为基准,这是零点评价,因为培训数据集中没有任何这种政策。在CWE和CBB中,我们的工具政策2codealCodeal为%%RUR5, 和MRUR1的搜索率为24%% 和MRRUC) 。