The importance of programming education has lead to dedicated educational programming environments, where users visually arrange block-based programming constructs that typically control graphical, interactive game-like programs. The Scratch programming environment is particularly popular, with more than 70 million registered users at the time of this writing. While the block-based nature of Scratch helps learners by preventing syntactical mistakes, there nevertheless remains a need to provide feedback and support in order to implement desired functionality. To support individual learning and classroom settings, this feedback and support should ideally be provided in an automated fashion, which requires tests to enable dynamic program analysis. The Whisker framework enables automated testing of Scratch programs, but creating these automated tests for Scratch programs is challenging. In this paper, we therefore investigate how to automatically generate Whisker tests. This raises important challenges: First, game-like programs are typically randomised, leading to flaky tests. Second, Scratch programs usually consist of animations and interactions with long delays, inhibiting the application of classical test generation approaches. Evaluation on common programming exercises, a random sample of 1000 Scratch user programs, and the 1000 most popular Scratch programs demonstrates that our approach enables Whisker to reliably accelerate test executions, and even though many Scratch programs are small and easy to cover, there are many unique challenges for which advanced search-based test generation using many-objective algorithms is needed in order to achieve high coverage.
翻译:编程教育的重要性导致了专门的教育编程环境,在这些环境中,用户通过视觉安排基于街区的编程结构,通常能够控制图形的、互动的游戏式程序。Scratch编程环境特别受欢迎,在撰写本文时有7 000多万注册用户。Scratch的编程环境有助于学生防止综合教学错误,尽管Scratch的编程性质有助于学习者,但是仍然需要提供反馈和支持,以便实施预期功能。为了支持个人学习和课堂设置,这种反馈和支持最好以自动化的方式提供,这需要测试才能进行动态的方案分析。Whisker框架可以自动测试Scratch程序,但为Scratch程序创建这些自动测试是非常困难的。在本文中,我们调查如何自动生成Wisker测试。这提出了重大挑战:首先,像游戏一样的程序一般是随机随机的,导致测试。第二,Scratch方案通常由动画和互动长期拖延,从而妨碍应用经典的制程方法。对通用的编程工作进行评估,对1000个Scratch用户程序进行随机抽样,但为Scratch程序创建这些自动的自动测试范围很困难。