Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks.
翻译:过去几年来,利用机器学习的语文程序处理(PLP)取得了巨大的改进,越来越多的人有兴趣探索这一有希望的领域,然而,鉴于需要解决的有多种PLP任务、大量数据集和模型的发布,以及所涉及的一套复杂的编译员或工具,新的研究人员和开发者很难找到建造自己的机器学习管道的适当组成部分,为了改进机器学习组件的可找到性、可获取性、互操作性和可再使用性(FaIRness),我们收集和分析了一套基于机器学习的PLP领域的代表性文件。我们然后确定和确定关键概念,包括PLP任务、模型架构和辅助工具。最后,我们示范了利用可再使用部件来建造机器学习管道以解决一套PLP任务的事例。