Block-based programming languages like Scratch are increasingly popular for programming education and end-user programming. Recent program analyses build on the insight that source code can be modelled using techniques from natural language processing. Many of the regularities of source code that support this approach are due to the syntactic overhead imposed by textual programming languages. This syntactic overhead, however, is precisely what block-based languages remove in order to simplify programming. Consequently, it is unclear how well this modelling approach performs on block-based programming languages. In this paper, we investigate the applicability of language models for the popular block-based programming language Scratch. We model Scratch programs using n-gram models, the most essential type of language model, and transformers, a popular deep learning model. Evaluation on the example tasks of code completion and bug finding confirm that blocks inhibit predictability, but the use of language models is nevertheless feasible. Our findings serve as foundation for improving tooling and analyses for block-based languages.
翻译:Scratch等基于街区的编程语言在编程教育和终端用户编程中越来越受欢迎。最近的方案分析基于一种洞察力,即源代码可以使用自然语言处理技术来模拟源代码。支持这一方法的源代码的许多常规性是由于文本编程语言所施加的综合间接管理。然而,这种综合间接管理正是基于街区的语言为简化编程而消除的。因此,这种建模方法在基于街区的编程语言上表现得如何还不清楚。在本文中,我们调查了基于街区的编程语言Scratch的语言模式的适用性。我们用n-gram模型(最基本的语言模式类型)和变压器(流行的深层次学习模式)来模拟Scratch程序。对代码完成和错误发现等示例的评估证实,区块阻碍可预测性,但使用语言模式是可行的。我们的调查结果是改进基于街区语言的工具和分析的基础。