Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing \textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.
翻译:自然语言描述中的自动代码合成是一项具有挑战性的任务。 我们目睹了近些年来在采用顺序到顺序深层次学习技术开发特定域语言代码生成系统方面取得的巨大进展。 在本文中,我们特别实验了基于 DSL 的基因化模型,并通过不同种类的对立实例展示了存在重大数据集偏差的情况。 我们还试验了两种基于变异器的模型,这些模型超越了所有现有的\ textsc{AlgoLisp} DSL 代码生成基线。 与目前最先进的系统一致,我们提议的模型在对抗环境下也取得了不良的性能。 因此,我们建议了几种数据集增强技术来减少偏差,并利用有力的实验来展示其功效。