Genetic programming is an evolutionary approach known for its performance in program synthesis. However, it is not yet mature enough for a practical use in real-world software development, since usually many training cases are required to generate programs that generalize to unseen test cases. As in practice, the training cases have to be expensively hand-labeled by the user, we need an approach to check the program behavior with a lower number of training cases. Metamorphic testing needs no labeled input/output examples. Instead, the program is executed multiple times, first on a given (randomly generated) input, followed by related inputs to check whether certain user-defined relations between the observed outputs hold. In this work, we suggest MTGP, which combines metamorphic testing and genetic programming and study its performance and the generalizability of the generated programs. Further, we analyze how the generalizability depends on the number of given labeled training cases. We find that using metamorphic testing combined with labeled training cases leads to a higher generalization rate than the use of labeled training cases alone in almost all studied configurations. Consequently, we recommend researchers to use metamorphic testing in their systems if the labeling of the training data is expensive.
翻译:基因编程是一种渐进式方法,在程序合成中以其性能而闻名。然而,它还不足以在现实世界软件开发中实际使用,因为通常需要许多培训案例,才能产生普通化的无形测试案例。实际上,培训案例必须由用户手工贴上昂贵的标签,我们需要一种方法来检查程序行为和较少的培训案例。变形测试不需要贴标签的输入/产出案例。相反,程序执行多次,首先根据给定的(随机生成的)投入,然后相关投入来检查所观察到的产出之间是否存在某些用户定义的关系。在此工作中,我们建议MTGP,将变形测试和基因编程结合起来,研究其性能和所产生方案的通用性。此外,我们分析一般性如何取决于给定的有标签的培训案例的数量。我们发现,使用带有标签的培训案例的变形测试导致比仅仅在几乎所有研究过的配置中使用标签的培训案例更普遍化的速度。因此,我们建议研究人员在其系统中使用变形测试是昂贵的,如果数据标签是昂贵的。