来源代码分析机器学习技术调查 (A Survey on Machine Learning Techniques for Source Code Analysis)

Context: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources.

翻译：背景:机器学习技术的进步鼓励研究人员将这些技术应用于使用源代码分析(例如测试和脆弱性检测)的大量软件工程任务。大量研究对社区提出了了解当前环境的挑战。目标:我们的目标是总结应用机器学习领域用于源代码分析的现有知识。方法:我们调查属于12类软件工程任务的研究,以及用于解决这些问题的相应机器学习技术、工具和数据集。为了这样做,我们进行了广泛的文献搜索,确定了在2002年至2021年期间出版的364份初级研究。我们在所确定研究的帮助下总结了我们的意见和调查结果。结果:我们的调查结果表明,在源代码分析任务中使用机器学习技术的情况在不断增加。我们综合了每项任务通常使用的步骤和总体工作流程,并总结了所使用的机器学习技术。此外,我们整理了一份综合清单,列出了在这方面可以使用的可用数据集和工具。最后,我们总结了这一领域存在的各种挑战,包括标准数据集的提供、可复制性和可复制性以及硬件资源。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

专知会员服务

39+阅读 · 2020年11月3日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日